From patchwork Thu Feb  2 18:28:15 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Sean Christopherson <seanjc@google.com>
X-Patchwork-Id: 13126634
Return-Path: <kvm-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 884ACC05027
	for <kvm@archiver.kernel.org>; Thu,  2 Feb 2023 18:28:51 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229602AbjBBS2u (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Thu, 2 Feb 2023 13:28:50 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33284 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232543AbjBBS2k (ORCPT <rfc822;kvm@vger.kernel.org>);
        Thu, 2 Feb 2023 13:28:40 -0500
Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com
 [IPv6:2607:f8b0:4864:20::649])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA93865ED1
        for <kvm@vger.kernel.org>; Thu,  2 Feb 2023 10:28:22 -0800 (PST)
Received: by mail-pl1-x649.google.com with SMTP id
 s5-20020a170903214500b00195e3b26848so1299717ple.7
        for <kvm@vger.kernel.org>; Thu, 02 Feb 2023 10:28:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc:subject:date:message-id:reply-to;
        bh=3LgqJvQdTUnDV8KRYP5nK1SQHicVLOIOxzAR2ICFNms=;
        b=pTHp9YIB2s9i7ADRCxkxVMBN5TbosV9207UzxT5JLIoA0iiwHbPDOCF2B0ulq0gjTz
         dmAyphmWqMmPznzay2s5DpCEp6NHjMRfjIIOhNALqfZq69tbr0GklWjDEBSb6T6yiMcI
         Acuzx3NkksjEreIyP7FImla+4q+h2dTJ7RCe9MYgXigjsRFI8fyQnksSrbMoHYw8b91W
         ziXGJ4LKf1/CT/L9ptzIHGgioiuiR31ITCTvKCYXTdb0PQKQH3W7HHV1V2qWrfBNepiz
         dUAZbmAh8qZCWqN/6bAgBFqW5SsXLlahiTC1ecRRDFk1WgO502NqjIWWgNW35qFckskp
         oU6A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=3LgqJvQdTUnDV8KRYP5nK1SQHicVLOIOxzAR2ICFNms=;
        b=XwDkRBksDI83N2naupGdrRBaD8sCA8exLb5SQ691eTyGhGRWqOYo36sobNbHMIoyjR
         92e+bQMXnbgPBobfdjLsgWZ9PJFCl8Y0G9309M+jGQKugKYuYQOIaaB58l2LBasSiMHo
         zGIhWYasWNAFSJk5f4hvOB5x/ZTqZpdTyOfODD7svRddY52nbjzxFIPVB/YuCtMvFhHE
         yw1+biAIly3ch9BxogV6gri1TTSCujGuvyK99+Of2TojMb0wpuv2spiQAPNJ9hVpstrW
         VT/fpJRxHVownm/5Zi8AOreUuwFVJWsmir3juFlGTXLger3TwPoK0Dz5BQVjqqZfgR2C
         3R6w==
X-Gm-Message-State: AO0yUKVwoxj2xRXnyXTpcewm0ovZoSdsYYrIq6N/Z1Lc8b9lGwxob9Qa
        +phHmz2ZAjG75Gvic3N6J3r3iNiiQWU=
X-Google-Smtp-Source: 
 AK7set8791yB3wjoH7FuBUwIUtJIVac8TGy0qezAzhWOiy6iUzV6fwbwC2DKjeSjG9TB4l1FbkQDVo+TZsI=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a17:90a:3da4:b0:22c:24f0:32f4 with SMTP id
 i33-20020a17090a3da400b0022c24f032f4mr755189pjc.93.1675362501969; Thu, 02 Feb
 2023 10:28:21 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Thu,  2 Feb 2023 18:28:15 +0000
In-Reply-To: <20230202182817.407394-1-seanjc@google.com>
Mime-Version: 1.0
References: <20230202182817.407394-1-seanjc@google.com>
X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog
Message-ID: <20230202182817.407394-2-seanjc@google.com>
Subject: [PATCH v2 1/3] KVM: x86/mmu: Use EMULTYPE flag to track write #PFs to
 shadow pages
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Huang Hang <hhuang@linux.alibaba.com>,
        Lai Jiangshan <jiangshan.ljs@antgroup.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

Use a new EMULTYPE flag, EMULTYPE_WRITE_PF_TO_SP, to track page faults
on self-changing writes to shadowed page tables instead of propagating
that information to the emulator via a semi-persistent vCPU flag.  Using
a flag in "struct kvm_vcpu_arch" is confusing, especially as implemented,
as it's not at all obvious that clearing the flag only when emulation
actually occurs is correct.

E.g. if KVM sets the flag and then retries the fault without ever getting
to the emulator, the flag will be left set for future calls into the
emulator.  But because the flag is consumed if and only if both
EMULTYPE_PF and EMULTYPE_ALLOW_RETRY_PF are set, and because
EMULTYPE_ALLOW_RETRY_PF is deliberately not set for direct MMUs, emulated
MMIO, or while L2 is active, KVM avoids false positives on a stale flag
since FNAME(page_fault) is guaranteed to be run and refresh the flag
before it's ultimately consumed by the tail end of reexecute_instruction().

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 37 ++++++++++++++++++---------------
 arch/x86/kvm/mmu/mmu.c          |  5 +++--
 arch/x86/kvm/mmu/mmu_internal.h | 12 ++++++++++-
 arch/x86/kvm/mmu/paging_tmpl.h  |  4 +---
 arch/x86/kvm/x86.c              | 15 ++-----------
 5 files changed, 37 insertions(+), 36 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4d2bc08794e4..a0fa6333edbe 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -942,23 +942,6 @@ struct kvm_vcpu_arch {
 
 	u64 msr_kvm_poll_control;
 
-	/*
-	 * Indicates the guest is trying to write a gfn that contains one or
-	 * more of the PTEs used to translate the write itself, i.e. the access
-	 * is changing its own translation in the guest page tables.  KVM exits
-	 * to userspace if emulation of the faulting instruction fails and this
-	 * flag is set, as KVM cannot make forward progress.
-	 *
-	 * If emulation fails for a write to guest page tables, KVM unprotects
-	 * (zaps) the shadow page for the target gfn and resumes the guest to
-	 * retry the non-emulatable instruction (on hardware).  Unprotecting the
-	 * gfn doesn't allow forward progress for a self-changing access because
-	 * doing so also zaps the translation for the gfn, i.e. retrying the
-	 * instruction will hit a !PRESENT fault, which results in a new shadow
-	 * page and sends KVM back to square one.
-	 */
-	bool write_fault_to_shadow_pgtable;
-
 	/* set at EPT violation at this point */
 	unsigned long exit_qualification;
 
@@ -1891,6 +1874,25 @@ u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu);
  * EMULTYPE_COMPLETE_USER_EXIT - Set when the emulator should update interruptibility
  *				 state and inject single-step #DBs after skipping
  *				 an instruction (after completing userspace I/O).
+ *
+ * EMULTYPE_WRITE_PF_TO_SP - Set when emulating an intercepted page fault that
+ *			     is attempting to write a gfn that contains one or
+ *			     more of the PTEs used to translate the write itself,
+ *			     and the owning page table is being shadowed by KVM.
+ *			     If emulation of the faulting instruction fails and
+ *			     this flag is set, KVM will exit to userspace instead
+ *			     of retrying emulation as KVM cannot make forward
+ *			     progress.
+ *
+ *			     If emulation fails for a write to guest page tables,
+ *			     KVM unprotects (zaps) the shadow page for the target
+ *			     gfn and resumes the guest to retry the non-emulatable
+ *			     instruction (on hardware).  Unprotecting the gfn
+ *			     doesn't allow forward progress for a self-changing
+ *			     access because doing so also zaps the translation for
+ *			     the gfn, i.e. retrying the instruction will hit a
+ *			     !PRESENT fault, which results in a new shadow page
+ *			     and sends KVM back to square one.
  */
 #define EMULTYPE_NO_DECODE	    (1 << 0)
 #define EMULTYPE_TRAP_UD	    (1 << 1)
@@ -1900,6 +1902,7 @@ u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu);
 #define EMULTYPE_VMWARE_GP	    (1 << 5)
 #define EMULTYPE_PF		    (1 << 6)
 #define EMULTYPE_COMPLETE_USER_EXIT (1 << 7)
+#define EMULTYPE_WRITE_PF_TO_SP	    (1 << 8)
 
 int kvm_emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type);
 int kvm_emulate_instruction_from_buffer(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c91ee2927dd7..bf38575a1957 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4203,7 +4203,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
 	      work->arch.cr3 != vcpu->arch.mmu->get_guest_pgd(vcpu))
 		return;
 
-	kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true);
+	kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true, NULL);
 }
 
 static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
@@ -5664,7 +5664,8 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 
 	if (r == RET_PF_INVALID) {
 		r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa,
-					  lower_32_bits(error_code), false);
+					  lower_32_bits(error_code), false,
+					  &emulation_type);
 		if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm))
 			return -EIO;
 	}
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index cc58631e2336..2cbb155c686c 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -240,6 +240,13 @@ struct kvm_page_fault {
 	kvm_pfn_t pfn;
 	hva_t hva;
 	bool map_writable;
+
+	/*
+	 * Indicates the guest is trying to write a gfn that contains one or
+	 * more of the PTEs used to translate the write itself, i.e. the access
+	 * is changing its own translation in the guest page tables.
+	 */
+	bool write_fault_to_shadow_pgtable;
 };
 
 int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
@@ -273,7 +280,7 @@ enum {
 };
 
 static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
-					u32 err, bool prefetch)
+					u32 err, bool prefetch, int *emulation_type)
 {
 	struct kvm_page_fault fault = {
 		.addr = cr2_or_gpa,
@@ -312,6 +319,9 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	else
 		r = vcpu->arch.mmu->page_fault(vcpu, &fault);
 
+	if (fault.write_fault_to_shadow_pgtable && emulation_type)
+		*emulation_type |= EMULTYPE_WRITE_PF_TO_SP;
+
 	/*
 	 * Similar to above, prefetch faults aren't truly spurious, and the
 	 * async #PF path doesn't do emulation.  Do count faults that are fixed
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 57f0b75c80f9..5d2958299b4f 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -825,10 +825,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	if (r)
 		return r;
 
-	vcpu->arch.write_fault_to_shadow_pgtable = false;
-
 	is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu,
-	      &walker, fault->user, &vcpu->arch.write_fault_to_shadow_pgtable);
+	      &walker, fault->user, &fault->write_fault_to_shadow_pgtable);
 
 	if (is_self_change_mapping)
 		fault->max_level = PG_LEVEL_4K;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 508074e47bc0..de2a0d1c9c21 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8427,7 +8427,6 @@ static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type)
 }
 
 static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
-				  bool write_fault_to_shadow_pgtable,
 				  int emulation_type)
 {
 	gpa_t gpa = cr2_or_gpa;
@@ -8498,7 +8497,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	 * be fixed by unprotecting shadow page and it should
 	 * be reported to userspace.
 	 */
-	return !write_fault_to_shadow_pgtable;
+	return !(emulation_type & EMULTYPE_WRITE_PF_TO_SP);
 }
 
 static bool retry_instruction(struct x86_emulate_ctxt *ctxt,
@@ -8746,20 +8745,12 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	int r;
 	struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
 	bool writeback = true;
-	bool write_fault_to_spt;
 
 	if (unlikely(!kvm_can_emulate_insn(vcpu, emulation_type, insn, insn_len)))
 		return 1;
 
 	vcpu->arch.l1tf_flush_l1d = true;
 
-	/*
-	 * Clear write_fault_to_shadow_pgtable here to ensure it is
-	 * never reused.
-	 */
-	write_fault_to_spt = vcpu->arch.write_fault_to_shadow_pgtable;
-	vcpu->arch.write_fault_to_shadow_pgtable = false;
-
 	if (!(emulation_type & EMULTYPE_NO_DECODE)) {
 		kvm_clear_exception_queue(vcpu);
 
@@ -8780,7 +8771,6 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 				return 1;
 			}
 			if (reexecute_instruction(vcpu, cr2_or_gpa,
-						  write_fault_to_spt,
 						  emulation_type))
 				return 1;
 
@@ -8859,8 +8849,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 		return 1;
 
 	if (r == EMULATION_FAILED) {
-		if (reexecute_instruction(vcpu, cr2_or_gpa, write_fault_to_spt,
-					emulation_type))
+		if (reexecute_instruction(vcpu, cr2_or_gpa, emulation_type))
 			return 1;
 
 		return handle_emulation_failure(vcpu, emulation_type);

From patchwork Thu Feb  2 18:28:16 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Sean Christopherson <seanjc@google.com>
X-Patchwork-Id: 13126635
Return-Path: <kvm-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 49857C61DA4
	for <kvm@archiver.kernel.org>; Thu,  2 Feb 2023 18:29:08 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232695AbjBBS3G (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Thu, 2 Feb 2023 13:29:06 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33322 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232579AbjBBS2l (ORCPT <rfc822;kvm@vger.kernel.org>);
        Thu, 2 Feb 2023 13:28:41 -0500
Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com
 [IPv6:2607:f8b0:4864:20::449])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7583065F3A
        for <kvm@vger.kernel.org>; Thu,  2 Feb 2023 10:28:24 -0800 (PST)
Received: by mail-pf1-x449.google.com with SMTP id
 g5-20020a62e305000000b00593dc84b678so1354932pfh.18
        for <kvm@vger.kernel.org>; Thu, 02 Feb 2023 10:28:24 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc:subject:date:message-id:reply-to;
        bh=23k7ngnnq1dagvKcwTov6zadXURGVa/hBCuLb+iV0g0=;
        b=qlL/NRrpG/+vpCbk4ZkF9xxNrvsWRSi6Ew/JHzWHJRFEE18vAOAO0CiLpVr7HiCAIS
         Xe+67hFPUl+94O9IvYmpLCQy2+x2p+BQrVc5ZrQIT6CqMJ4N9kHM1xwcuvqQSdloVLTl
         4h9uMBNRWSRlWSp3i/vCULSGzs4w2spCcj5cGI0XJ7GVEtCFgRZS+QcocrikNW3ysSjD
         0/27DxwuU9cVtWo6s0S9Vmak9nzECvFXw97c22bO2QBK2MLxHv3N+Z1FpfaQ5iGdvYz1
         VjUJ6CgH/iBzlsETUhDGS6soZPFVpZQHzdogttMDnsowbI5LcqQ/mCsxEGfDGAwDjG0S
         n4mQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=23k7ngnnq1dagvKcwTov6zadXURGVa/hBCuLb+iV0g0=;
        b=SIddphxW0fGKEMltf+6PIjVyBzQS3guRewsxz0ynhVU8zYKHTkESv2h2D8YfaOfTGE
         /Q7sm4KaRxPko2WA3iKaiOXtG7kkk9UBbbH6qapRb/WIRIZTOFvAPn8Hw3RxeTx8+9BS
         ete8D4vezFn5b/Bwj1cfAGKSQSkpcJKzWlTXaMy7WOU5e8EA50GzYLSV81VKeQDtXEXw
         ma8+obkW67A7nD5EzJmNK5Iq9rW1hnBCOcwYjI8jQHJCGeLLieJ+/YQsRE2qibF981yx
         fdzTSFPP/Y0/1jexld2bkZVmUsHj7pVj3nixLqYKZRoSMwVinJpSbt5s1iLHcEGvnTAO
         txEg==
X-Gm-Message-State: AO0yUKXI1ermtnvS/YC/Pb7lmpK/Cbn+sNUYcxVDwaWB7sFm4V/8XlhU
        pr4MoqDc0H0BPqWSHvmUrYqSu44L00s=
X-Google-Smtp-Source: 
 AK7set9nxsXLdOaC/swc1VO//GxkzUtI6Eo0OktLOLbkC37FgdLIBAWf3HiQs/s6L8HdWKAC3fbJ1/vLVQk=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a63:5715:0:b0:477:c1a3:9a10 with SMTP id
 l21-20020a635715000000b00477c1a39a10mr1205868pgb.33.1675362503600; Thu, 02
 Feb 2023 10:28:23 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Thu,  2 Feb 2023 18:28:16 +0000
In-Reply-To: <20230202182817.407394-1-seanjc@google.com>
Mime-Version: 1.0
References: <20230202182817.407394-1-seanjc@google.com>
X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog
Message-ID: <20230202182817.407394-3-seanjc@google.com>
Subject: [PATCH v2 2/3] KVM: x86/mmu: Detect write #PF to shadow pages during
 FNAME(fetch) walk
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Huang Hang <hhuang@linux.alibaba.com>,
        Lai Jiangshan <jiangshan.ljs@antgroup.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

Move the detection of write #PF to shadow pages, i.e. a fault on a write
to a page table that is being shadowed by KVM that is used to translate
the write itself, from FNAME(is_self_change_mapping) to FNAME(fetch).
There is no need to detect the self-referential write before
kvm_faultin_pfn() as KVM does not consume EMULTYPE_WRITE_PF_TO_SP for
accesses that resolve to "error or no-slot" pfns, i.e. KVM doesn't allow
retrying MMIO accesses or writes to read-only memslots.

Detecting the EMULTYPE_WRITE_PF_TO_SP scenario in FNAME(fetch) will allow
dropping FNAME(is_self_change_mapping) entirely, as the hugepage
interaction can be deferred to kvm_mmu_hugepage_adjust().

Cc: Huang Hang <hhuang@linux.alibaba.com>
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Link: https://lore.kernel.org/r/20221213125538.81209-1-jiangshanlai@gmail.com
[sean: split to separate patch, write changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/paging_tmpl.h | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 5d2958299b4f..f57d9074fb9b 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -685,6 +685,9 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 
 		if (sp != ERR_PTR(-EEXIST))
 			link_shadow_page(vcpu, it.sptep, sp);
+
+		if (fault->write && table_gfn == fault->gfn)
+			fault->write_fault_to_shadow_pgtable = true;
 	}
 
 	kvm_mmu_hugepage_adjust(vcpu, fault);
@@ -741,17 +744,13 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
  * created when kvm establishes shadow page table that stop kvm using large
  * page size. Do it early can avoid unnecessary #PF and emulation.
  *
- * @write_fault_to_shadow_pgtable will return true if the fault gfn is
- * currently used as its page table.
- *
  * Note: the PDPT page table is not checked for PAE-32 bit guest. It is ok
  * since the PDPT is always shadowed, that means, we can not use large page
  * size to map the gfn which is used as PDPT.
  */
 static bool
 FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu,
-			      struct guest_walker *walker, bool user_fault,
-			      bool *write_fault_to_shadow_pgtable)
+			      struct guest_walker *walker, bool user_fault)
 {
 	int level;
 	gfn_t mask = ~(KVM_PAGES_PER_HPAGE(walker->level) - 1);
@@ -765,7 +764,6 @@ FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu,
 		gfn_t gfn = walker->gfn ^ walker->table_gfn[level - 1];
 
 		self_changed |= !(gfn & mask);
-		*write_fault_to_shadow_pgtable |= !gfn;
 	}
 
 	return self_changed;
@@ -826,7 +824,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 		return r;
 
 	is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu,
-	      &walker, fault->user, &fault->write_fault_to_shadow_pgtable);
+	      &walker, fault->user);
 
 	if (is_self_change_mapping)
 		fault->max_level = PG_LEVEL_4K;

From patchwork Thu Feb  2 18:28:17 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Sean Christopherson <seanjc@google.com>
X-Patchwork-Id: 13126637
Return-Path: <kvm-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id DDB19C05027
	for <kvm@archiver.kernel.org>; Thu,  2 Feb 2023 18:29:29 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232662AbjBBS32 (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Thu, 2 Feb 2023 13:29:28 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33298 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232655AbjBBS2s (ORCPT <rfc822;kvm@vger.kernel.org>);
        Thu, 2 Feb 2023 13:28:48 -0500
Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com
 [IPv6:2607:f8b0:4864:20::b4a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 05F8B4522B
        for <kvm@vger.kernel.org>; Thu,  2 Feb 2023 10:28:27 -0800 (PST)
Received: by mail-yb1-xb4a.google.com with SMTP id
 i17-20020a25be91000000b0082663f3eecbso2505916ybk.2
        for <kvm@vger.kernel.org>; Thu, 02 Feb 2023 10:28:27 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc:subject:date:message-id:reply-to;
        bh=Leen11Z2EIFQsH+NN7hqdICZGapTSqx3IkQSdi85rv4=;
        b=swDomma/Ao/1AjVGqZhc77Ly1aJmaYi96lJ6fEnLjho/yWLYjCbG87uCRLZVXzLq1M
         LScVQt+pZ9knZ5XD8RVoKRxLT1q+fEpmj2L2LXViaHSf7ilAayTs/eQh/Rwtvr2Vs3eW
         TtDm8cGugXeffCt23ir1w6zNFfj7yik+sgpFjM9V9BSjPfEoc9GUBKMV6qBA74fyHAWP
         +dsl0anfnGpkYnWUciMVIuHcfYl7QH02zwFv0QlTgGrXln7suPDF2xX5Qqk8Hsk0ZjyP
         jBboZOjT/qXiq8k9yfGUjmpOpA/fkbxmaAzM0uqiIkU6UwAe9vuKWKkQTF0m1+TCwBM8
         dJvQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=Leen11Z2EIFQsH+NN7hqdICZGapTSqx3IkQSdi85rv4=;
        b=ySFIE51YE/4H59QM79u1QqxbcJk0w+8a/7n8tM8xLtQXIPZNtwZTHVhuFz7WUCfFA8
         6B69EYzSNJear3MFKuNP5QEA4hJZ+NGuQQD1u1OA/Tjg69UPABpv/aUMjTTa4VLghgpO
         6D1QyWnR0Z23Rk4T0S4LW8e+6VJMEm7rJqU+vUCnhV4W6y7Ui5W02MeVIY1B2h6wgAWT
         FxxYWMRMFgftJUv8EfDfDMGC/cfhdZktqx22HvWC8xktqtGkpJaWGjbQ/Fd/dWY/6oQ4
         sRH6xIA3Zs9Z0B5W/cQd7hPXJhxIMvBTUIwchuijWlRwXiQ+yM17Pp/kMKJnScpe4S4j
         /U2Q==
X-Gm-Message-State: AO0yUKX/j+9WLy5/tHhg2p4O1uG4wRUzTZ+aSnZjds9rZgpeNnjF52j8
        j2ZRXjTtf9B5hO4gwvx/Skzsr7E3yqg=
X-Google-Smtp-Source: 
 AK7set+v8zDs6XnzjllU8ur/f0MIRCkHIofH+Wt+ThfPgIot1Um67e13cWqpbQBNYlMCc88CYLX3y0XmC3I=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a25:d095:0:b0:857:8f9c:7b87 with SMTP id
 h143-20020a25d095000000b008578f9c7b87mr368794ybg.558.1675362505413; Thu, 02
 Feb 2023 10:28:25 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Thu,  2 Feb 2023 18:28:17 +0000
In-Reply-To: <20230202182817.407394-1-seanjc@google.com>
Mime-Version: 1.0
References: <20230202182817.407394-1-seanjc@google.com>
X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog
Message-ID: <20230202182817.407394-4-seanjc@google.com>
Subject: [PATCH v2 3/3] KVM: x86/mmu: Remove FNAME(is_self_change_mapping)
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Huang Hang <hhuang@linux.alibaba.com>,
        Lai Jiangshan <jiangshan.ljs@antgroup.com>
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

Drop FNAME(is_self_change_mapping) and instead rely on
kvm_mmu_hugepage_adjust() to adjust the hugepage accordingly.  Prior to
commit 4cd071d13c5c ("KVM: x86/mmu: Move calls to thp_adjust() down a
level"), the hugepage adjustment was done before allocating new shadow
pages, i.e. failed to restrict the hugepage sizes if a new shadow page
resulted in account_shadowed() changing the disallowed hugepage tracking.

Removing FNAME(is_self_change_mapping) fixes a bug reported by Huang Hang
where KVM unnecessarily forces a 4KiB page.  FNAME(is_self_change_mapping)
has a defect in that it blindly disables _all_ hugepage mappings rather
than trying to reduce the size of the hugepage.  If the guest is writing
to a 1GiB page and the 1GiB is self-referential but a 2MiB page is not,
then KVM can and should create a 2MiB mapping.

Add a comment above the call to kvm_mmu_hugepage_adjust() to call out the
new dependency on adjusting the hugepage size after walking indirect PTEs.

Reported-by: Huang Hang <hhuang@linux.alibaba.com>
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Link: https://lore.kernel.org/r/20221213125538.81209-1-jiangshanlai@gmail.com
[sean: rework changelog after separating out the emulator change]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/paging_tmpl.h | 51 +++++-----------------------------
 1 file changed, 7 insertions(+), 44 deletions(-)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index f57d9074fb9b..a056f2773dd9 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -690,6 +690,12 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 			fault->write_fault_to_shadow_pgtable = true;
 	}
 
+	/*
+	 * Adjust the hugepage size _after_ resolving indirect shadow pages.
+	 * KVM doesn't support mapping hugepages into the guest for gfns that
+	 * are being shadowed by KVM, i.e. allocating a new shadow page may
+	 * affect the allowed hugepage size.
+	 */
 	kvm_mmu_hugepage_adjust(vcpu, fault);
 
 	trace_kvm_mmu_spte_requested(fault);
@@ -734,41 +740,6 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 	return RET_PF_RETRY;
 }
 
- /*
- * To see whether the mapped gfn can write its page table in the current
- * mapping.
- *
- * It is the helper function of FNAME(page_fault). When guest uses large page
- * size to map the writable gfn which is used as current page table, we should
- * force kvm to use small page size to map it because new shadow page will be
- * created when kvm establishes shadow page table that stop kvm using large
- * page size. Do it early can avoid unnecessary #PF and emulation.
- *
- * Note: the PDPT page table is not checked for PAE-32 bit guest. It is ok
- * since the PDPT is always shadowed, that means, we can not use large page
- * size to map the gfn which is used as PDPT.
- */
-static bool
-FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu,
-			      struct guest_walker *walker, bool user_fault)
-{
-	int level;
-	gfn_t mask = ~(KVM_PAGES_PER_HPAGE(walker->level) - 1);
-	bool self_changed = false;
-
-	if (!(walker->pte_access & ACC_WRITE_MASK ||
-	    (!is_cr0_wp(vcpu->arch.mmu) && !user_fault)))
-		return false;
-
-	for (level = walker->level; level <= walker->max_level; level++) {
-		gfn_t gfn = walker->gfn ^ walker->table_gfn[level - 1];
-
-		self_changed |= !(gfn & mask);
-	}
-
-	return self_changed;
-}
-
 /*
  * Page fault handler.  There are several causes for a page fault:
  *   - there is no shadow pte for the guest pte
@@ -787,7 +758,6 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 {
 	struct guest_walker walker;
 	int r;
-	bool is_self_change_mapping;
 
 	pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->error_code);
 	WARN_ON_ONCE(fault->is_tdp);
@@ -812,6 +782,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	}
 
 	fault->gfn = walker.gfn;
+	fault->max_level = walker.level;
 	fault->slot = kvm_vcpu_gfn_to_memslot(vcpu, fault->gfn);
 
 	if (page_fault_handle_page_track(vcpu, fault)) {
@@ -823,14 +794,6 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	if (r)
 		return r;
 
-	is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu,
-	      &walker, fault->user);
-
-	if (is_self_change_mapping)
-		fault->max_level = PG_LEVEL_4K;
-	else
-		fault->max_level = walker.level;
-
 	r = kvm_faultin_pfn(vcpu, fault, walker.pte_access);
 	if (r != RET_PF_CONTINUE)
 		return r;