From patchwork Thu Feb 2 18:28:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 13126634 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 884ACC05027 for ; Thu, 2 Feb 2023 18:28:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229602AbjBBS2u (ORCPT ); Thu, 2 Feb 2023 13:28:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33284 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232543AbjBBS2k (ORCPT ); Thu, 2 Feb 2023 13:28:40 -0500 Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com [IPv6:2607:f8b0:4864:20::649]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA93865ED1 for ; Thu, 2 Feb 2023 10:28:22 -0800 (PST) Received: by mail-pl1-x649.google.com with SMTP id s5-20020a170903214500b00195e3b26848so1299717ple.7 for ; Thu, 02 Feb 2023 10:28:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=3LgqJvQdTUnDV8KRYP5nK1SQHicVLOIOxzAR2ICFNms=; b=pTHp9YIB2s9i7ADRCxkxVMBN5TbosV9207UzxT5JLIoA0iiwHbPDOCF2B0ulq0gjTz dmAyphmWqMmPznzay2s5DpCEp6NHjMRfjIIOhNALqfZq69tbr0GklWjDEBSb6T6yiMcI Acuzx3NkksjEreIyP7FImla+4q+h2dTJ7RCe9MYgXigjsRFI8fyQnksSrbMoHYw8b91W ziXGJ4LKf1/CT/L9ptzIHGgioiuiR31ITCTvKCYXTdb0PQKQH3W7HHV1V2qWrfBNepiz dUAZbmAh8qZCWqN/6bAgBFqW5SsXLlahiTC1ecRRDFk1WgO502NqjIWWgNW35qFckskp oU6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=3LgqJvQdTUnDV8KRYP5nK1SQHicVLOIOxzAR2ICFNms=; b=XwDkRBksDI83N2naupGdrRBaD8sCA8exLb5SQ691eTyGhGRWqOYo36sobNbHMIoyjR 92e+bQMXnbgPBobfdjLsgWZ9PJFCl8Y0G9309M+jGQKugKYuYQOIaaB58l2LBasSiMHo zGIhWYasWNAFSJk5f4hvOB5x/ZTqZpdTyOfODD7svRddY52nbjzxFIPVB/YuCtMvFhHE yw1+biAIly3ch9BxogV6gri1TTSCujGuvyK99+Of2TojMb0wpuv2spiQAPNJ9hVpstrW VT/fpJRxHVownm/5Zi8AOreUuwFVJWsmir3juFlGTXLger3TwPoK0Dz5BQVjqqZfgR2C 3R6w== X-Gm-Message-State: AO0yUKVwoxj2xRXnyXTpcewm0ovZoSdsYYrIq6N/Z1Lc8b9lGwxob9Qa +phHmz2ZAjG75Gvic3N6J3r3iNiiQWU= X-Google-Smtp-Source: AK7set8791yB3wjoH7FuBUwIUtJIVac8TGy0qezAzhWOiy6iUzV6fwbwC2DKjeSjG9TB4l1FbkQDVo+TZsI= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:90a:3da4:b0:22c:24f0:32f4 with SMTP id i33-20020a17090a3da400b0022c24f032f4mr755189pjc.93.1675362501969; Thu, 02 Feb 2023 10:28:21 -0800 (PST) Reply-To: Sean Christopherson Date: Thu, 2 Feb 2023 18:28:15 +0000 In-Reply-To: <20230202182817.407394-1-seanjc@google.com> Mime-Version: 1.0 References: <20230202182817.407394-1-seanjc@google.com> X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog Message-ID: <20230202182817.407394-2-seanjc@google.com> Subject: [PATCH v2 1/3] KVM: x86/mmu: Use EMULTYPE flag to track write #PFs to shadow pages From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Huang Hang , Lai Jiangshan Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Use a new EMULTYPE flag, EMULTYPE_WRITE_PF_TO_SP, to track page faults on self-changing writes to shadowed page tables instead of propagating that information to the emulator via a semi-persistent vCPU flag. Using a flag in "struct kvm_vcpu_arch" is confusing, especially as implemented, as it's not at all obvious that clearing the flag only when emulation actually occurs is correct. E.g. if KVM sets the flag and then retries the fault without ever getting to the emulator, the flag will be left set for future calls into the emulator. But because the flag is consumed if and only if both EMULTYPE_PF and EMULTYPE_ALLOW_RETRY_PF are set, and because EMULTYPE_ALLOW_RETRY_PF is deliberately not set for direct MMUs, emulated MMIO, or while L2 is active, KVM avoids false positives on a stale flag since FNAME(page_fault) is guaranteed to be run and refresh the flag before it's ultimately consumed by the tail end of reexecute_instruction(). Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm_host.h | 37 ++++++++++++++++++--------------- arch/x86/kvm/mmu/mmu.c | 5 +++-- arch/x86/kvm/mmu/mmu_internal.h | 12 ++++++++++- arch/x86/kvm/mmu/paging_tmpl.h | 4 +--- arch/x86/kvm/x86.c | 15 ++----------- 5 files changed, 37 insertions(+), 36 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4d2bc08794e4..a0fa6333edbe 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -942,23 +942,6 @@ struct kvm_vcpu_arch { u64 msr_kvm_poll_control; - /* - * Indicates the guest is trying to write a gfn that contains one or - * more of the PTEs used to translate the write itself, i.e. the access - * is changing its own translation in the guest page tables. KVM exits - * to userspace if emulation of the faulting instruction fails and this - * flag is set, as KVM cannot make forward progress. - * - * If emulation fails for a write to guest page tables, KVM unprotects - * (zaps) the shadow page for the target gfn and resumes the guest to - * retry the non-emulatable instruction (on hardware). Unprotecting the - * gfn doesn't allow forward progress for a self-changing access because - * doing so also zaps the translation for the gfn, i.e. retrying the - * instruction will hit a !PRESENT fault, which results in a new shadow - * page and sends KVM back to square one. - */ - bool write_fault_to_shadow_pgtable; - /* set at EPT violation at this point */ unsigned long exit_qualification; @@ -1891,6 +1874,25 @@ u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu); * EMULTYPE_COMPLETE_USER_EXIT - Set when the emulator should update interruptibility * state and inject single-step #DBs after skipping * an instruction (after completing userspace I/O). + * + * EMULTYPE_WRITE_PF_TO_SP - Set when emulating an intercepted page fault that + * is attempting to write a gfn that contains one or + * more of the PTEs used to translate the write itself, + * and the owning page table is being shadowed by KVM. + * If emulation of the faulting instruction fails and + * this flag is set, KVM will exit to userspace instead + * of retrying emulation as KVM cannot make forward + * progress. + * + * If emulation fails for a write to guest page tables, + * KVM unprotects (zaps) the shadow page for the target + * gfn and resumes the guest to retry the non-emulatable + * instruction (on hardware). Unprotecting the gfn + * doesn't allow forward progress for a self-changing + * access because doing so also zaps the translation for + * the gfn, i.e. retrying the instruction will hit a + * !PRESENT fault, which results in a new shadow page + * and sends KVM back to square one. */ #define EMULTYPE_NO_DECODE (1 << 0) #define EMULTYPE_TRAP_UD (1 << 1) @@ -1900,6 +1902,7 @@ u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu); #define EMULTYPE_VMWARE_GP (1 << 5) #define EMULTYPE_PF (1 << 6) #define EMULTYPE_COMPLETE_USER_EXIT (1 << 7) +#define EMULTYPE_WRITE_PF_TO_SP (1 << 8) int kvm_emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type); int kvm_emulate_instruction_from_buffer(struct kvm_vcpu *vcpu, diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index c91ee2927dd7..bf38575a1957 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4203,7 +4203,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) work->arch.cr3 != vcpu->arch.mmu->get_guest_pgd(vcpu)) return; - kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true); + kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true, NULL); } static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) @@ -5664,7 +5664,8 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err if (r == RET_PF_INVALID) { r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, - lower_32_bits(error_code), false); + lower_32_bits(error_code), false, + &emulation_type); if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm)) return -EIO; } diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index cc58631e2336..2cbb155c686c 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -240,6 +240,13 @@ struct kvm_page_fault { kvm_pfn_t pfn; hva_t hva; bool map_writable; + + /* + * Indicates the guest is trying to write a gfn that contains one or + * more of the PTEs used to translate the write itself, i.e. the access + * is changing its own translation in the guest page tables. + */ + bool write_fault_to_shadow_pgtable; }; int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); @@ -273,7 +280,7 @@ enum { }; static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, - u32 err, bool prefetch) + u32 err, bool prefetch, int *emulation_type) { struct kvm_page_fault fault = { .addr = cr2_or_gpa, @@ -312,6 +319,9 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, else r = vcpu->arch.mmu->page_fault(vcpu, &fault); + if (fault.write_fault_to_shadow_pgtable && emulation_type) + *emulation_type |= EMULTYPE_WRITE_PF_TO_SP; + /* * Similar to above, prefetch faults aren't truly spurious, and the * async #PF path doesn't do emulation. Do count faults that are fixed diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 57f0b75c80f9..5d2958299b4f 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -825,10 +825,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault if (r) return r; - vcpu->arch.write_fault_to_shadow_pgtable = false; - is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu, - &walker, fault->user, &vcpu->arch.write_fault_to_shadow_pgtable); + &walker, fault->user, &fault->write_fault_to_shadow_pgtable); if (is_self_change_mapping) fault->max_level = PG_LEVEL_4K; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 508074e47bc0..de2a0d1c9c21 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8427,7 +8427,6 @@ static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type) } static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, - bool write_fault_to_shadow_pgtable, int emulation_type) { gpa_t gpa = cr2_or_gpa; @@ -8498,7 +8497,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, * be fixed by unprotecting shadow page and it should * be reported to userspace. */ - return !write_fault_to_shadow_pgtable; + return !(emulation_type & EMULTYPE_WRITE_PF_TO_SP); } static bool retry_instruction(struct x86_emulate_ctxt *ctxt, @@ -8746,20 +8745,12 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, int r; struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt; bool writeback = true; - bool write_fault_to_spt; if (unlikely(!kvm_can_emulate_insn(vcpu, emulation_type, insn, insn_len))) return 1; vcpu->arch.l1tf_flush_l1d = true; - /* - * Clear write_fault_to_shadow_pgtable here to ensure it is - * never reused. - */ - write_fault_to_spt = vcpu->arch.write_fault_to_shadow_pgtable; - vcpu->arch.write_fault_to_shadow_pgtable = false; - if (!(emulation_type & EMULTYPE_NO_DECODE)) { kvm_clear_exception_queue(vcpu); @@ -8780,7 +8771,6 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, return 1; } if (reexecute_instruction(vcpu, cr2_or_gpa, - write_fault_to_spt, emulation_type)) return 1; @@ -8859,8 +8849,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, return 1; if (r == EMULATION_FAILED) { - if (reexecute_instruction(vcpu, cr2_or_gpa, write_fault_to_spt, - emulation_type)) + if (reexecute_instruction(vcpu, cr2_or_gpa, emulation_type)) return 1; return handle_emulation_failure(vcpu, emulation_type); From patchwork Thu Feb 2 18:28:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 13126635 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49857C61DA4 for ; Thu, 2 Feb 2023 18:29:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232695AbjBBS3G (ORCPT ); Thu, 2 Feb 2023 13:29:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33322 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232579AbjBBS2l (ORCPT ); Thu, 2 Feb 2023 13:28:41 -0500 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7583065F3A for ; Thu, 2 Feb 2023 10:28:24 -0800 (PST) Received: by mail-pf1-x449.google.com with SMTP id g5-20020a62e305000000b00593dc84b678so1354932pfh.18 for ; Thu, 02 Feb 2023 10:28:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=23k7ngnnq1dagvKcwTov6zadXURGVa/hBCuLb+iV0g0=; b=qlL/NRrpG/+vpCbk4ZkF9xxNrvsWRSi6Ew/JHzWHJRFEE18vAOAO0CiLpVr7HiCAIS Xe+67hFPUl+94O9IvYmpLCQy2+x2p+BQrVc5ZrQIT6CqMJ4N9kHM1xwcuvqQSdloVLTl 4h9uMBNRWSRlWSp3i/vCULSGzs4w2spCcj5cGI0XJ7GVEtCFgRZS+QcocrikNW3ysSjD 0/27DxwuU9cVtWo6s0S9Vmak9nzECvFXw97c22bO2QBK2MLxHv3N+Z1FpfaQ5iGdvYz1 VjUJ6CgH/iBzlsETUhDGS6soZPFVpZQHzdogttMDnsowbI5LcqQ/mCsxEGfDGAwDjG0S n4mQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=23k7ngnnq1dagvKcwTov6zadXURGVa/hBCuLb+iV0g0=; b=SIddphxW0fGKEMltf+6PIjVyBzQS3guRewsxz0ynhVU8zYKHTkESv2h2D8YfaOfTGE /Q7sm4KaRxPko2WA3iKaiOXtG7kkk9UBbbH6qapRb/WIRIZTOFvAPn8Hw3RxeTx8+9BS ete8D4vezFn5b/Bwj1cfAGKSQSkpcJKzWlTXaMy7WOU5e8EA50GzYLSV81VKeQDtXEXw ma8+obkW67A7nD5EzJmNK5Iq9rW1hnBCOcwYjI8jQHJCGeLLieJ+/YQsRE2qibF981yx fdzTSFPP/Y0/1jexld2bkZVmUsHj7pVj3nixLqYKZRoSMwVinJpSbt5s1iLHcEGvnTAO txEg== X-Gm-Message-State: AO0yUKXI1ermtnvS/YC/Pb7lmpK/Cbn+sNUYcxVDwaWB7sFm4V/8XlhU pr4MoqDc0H0BPqWSHvmUrYqSu44L00s= X-Google-Smtp-Source: AK7set9nxsXLdOaC/swc1VO//GxkzUtI6Eo0OktLOLbkC37FgdLIBAWf3HiQs/s6L8HdWKAC3fbJ1/vLVQk= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a63:5715:0:b0:477:c1a3:9a10 with SMTP id l21-20020a635715000000b00477c1a39a10mr1205868pgb.33.1675362503600; Thu, 02 Feb 2023 10:28:23 -0800 (PST) Reply-To: Sean Christopherson Date: Thu, 2 Feb 2023 18:28:16 +0000 In-Reply-To: <20230202182817.407394-1-seanjc@google.com> Mime-Version: 1.0 References: <20230202182817.407394-1-seanjc@google.com> X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog Message-ID: <20230202182817.407394-3-seanjc@google.com> Subject: [PATCH v2 2/3] KVM: x86/mmu: Detect write #PF to shadow pages during FNAME(fetch) walk From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Huang Hang , Lai Jiangshan Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Lai Jiangshan Move the detection of write #PF to shadow pages, i.e. a fault on a write to a page table that is being shadowed by KVM that is used to translate the write itself, from FNAME(is_self_change_mapping) to FNAME(fetch). There is no need to detect the self-referential write before kvm_faultin_pfn() as KVM does not consume EMULTYPE_WRITE_PF_TO_SP for accesses that resolve to "error or no-slot" pfns, i.e. KVM doesn't allow retrying MMIO accesses or writes to read-only memslots. Detecting the EMULTYPE_WRITE_PF_TO_SP scenario in FNAME(fetch) will allow dropping FNAME(is_self_change_mapping) entirely, as the hugepage interaction can be deferred to kvm_mmu_hugepage_adjust(). Cc: Huang Hang Signed-off-by: Lai Jiangshan Link: https://lore.kernel.org/r/20221213125538.81209-1-jiangshanlai@gmail.com [sean: split to separate patch, write changelog] Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/paging_tmpl.h | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 5d2958299b4f..f57d9074fb9b 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -685,6 +685,9 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, if (sp != ERR_PTR(-EEXIST)) link_shadow_page(vcpu, it.sptep, sp); + + if (fault->write && table_gfn == fault->gfn) + fault->write_fault_to_shadow_pgtable = true; } kvm_mmu_hugepage_adjust(vcpu, fault); @@ -741,17 +744,13 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, * created when kvm establishes shadow page table that stop kvm using large * page size. Do it early can avoid unnecessary #PF and emulation. * - * @write_fault_to_shadow_pgtable will return true if the fault gfn is - * currently used as its page table. - * * Note: the PDPT page table is not checked for PAE-32 bit guest. It is ok * since the PDPT is always shadowed, that means, we can not use large page * size to map the gfn which is used as PDPT. */ static bool FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu, - struct guest_walker *walker, bool user_fault, - bool *write_fault_to_shadow_pgtable) + struct guest_walker *walker, bool user_fault) { int level; gfn_t mask = ~(KVM_PAGES_PER_HPAGE(walker->level) - 1); @@ -765,7 +764,6 @@ FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu, gfn_t gfn = walker->gfn ^ walker->table_gfn[level - 1]; self_changed |= !(gfn & mask); - *write_fault_to_shadow_pgtable |= !gfn; } return self_changed; @@ -826,7 +824,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault return r; is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu, - &walker, fault->user, &fault->write_fault_to_shadow_pgtable); + &walker, fault->user); if (is_self_change_mapping) fault->max_level = PG_LEVEL_4K; From patchwork Thu Feb 2 18:28:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 13126637 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DDB19C05027 for ; Thu, 2 Feb 2023 18:29:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232662AbjBBS32 (ORCPT ); Thu, 2 Feb 2023 13:29:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33298 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232655AbjBBS2s (ORCPT ); Thu, 2 Feb 2023 13:28:48 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 05F8B4522B for ; Thu, 2 Feb 2023 10:28:27 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id i17-20020a25be91000000b0082663f3eecbso2505916ybk.2 for ; Thu, 02 Feb 2023 10:28:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=Leen11Z2EIFQsH+NN7hqdICZGapTSqx3IkQSdi85rv4=; b=swDomma/Ao/1AjVGqZhc77Ly1aJmaYi96lJ6fEnLjho/yWLYjCbG87uCRLZVXzLq1M LScVQt+pZ9knZ5XD8RVoKRxLT1q+fEpmj2L2LXViaHSf7ilAayTs/eQh/Rwtvr2Vs3eW TtDm8cGugXeffCt23ir1w6zNFfj7yik+sgpFjM9V9BSjPfEoc9GUBKMV6qBA74fyHAWP +dsl0anfnGpkYnWUciMVIuHcfYl7QH02zwFv0QlTgGrXln7suPDF2xX5Qqk8Hsk0ZjyP jBboZOjT/qXiq8k9yfGUjmpOpA/fkbxmaAzM0uqiIkU6UwAe9vuKWKkQTF0m1+TCwBM8 dJvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Leen11Z2EIFQsH+NN7hqdICZGapTSqx3IkQSdi85rv4=; b=ySFIE51YE/4H59QM79u1QqxbcJk0w+8a/7n8tM8xLtQXIPZNtwZTHVhuFz7WUCfFA8 6B69EYzSNJear3MFKuNP5QEA4hJZ+NGuQQD1u1OA/Tjg69UPABpv/aUMjTTa4VLghgpO 6D1QyWnR0Z23Rk4T0S4LW8e+6VJMEm7rJqU+vUCnhV4W6y7Ui5W02MeVIY1B2h6wgAWT FxxYWMRMFgftJUv8EfDfDMGC/cfhdZktqx22HvWC8xktqtGkpJaWGjbQ/Fd/dWY/6oQ4 sRH6xIA3Zs9Z0B5W/cQd7hPXJhxIMvBTUIwchuijWlRwXiQ+yM17Pp/kMKJnScpe4S4j /U2Q== X-Gm-Message-State: AO0yUKX/j+9WLy5/tHhg2p4O1uG4wRUzTZ+aSnZjds9rZgpeNnjF52j8 j2ZRXjTtf9B5hO4gwvx/Skzsr7E3yqg= X-Google-Smtp-Source: AK7set+v8zDs6XnzjllU8ur/f0MIRCkHIofH+Wt+ThfPgIot1Um67e13cWqpbQBNYlMCc88CYLX3y0XmC3I= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:d095:0:b0:857:8f9c:7b87 with SMTP id h143-20020a25d095000000b008578f9c7b87mr368794ybg.558.1675362505413; Thu, 02 Feb 2023 10:28:25 -0800 (PST) Reply-To: Sean Christopherson Date: Thu, 2 Feb 2023 18:28:17 +0000 In-Reply-To: <20230202182817.407394-1-seanjc@google.com> Mime-Version: 1.0 References: <20230202182817.407394-1-seanjc@google.com> X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog Message-ID: <20230202182817.407394-4-seanjc@google.com> Subject: [PATCH v2 3/3] KVM: x86/mmu: Remove FNAME(is_self_change_mapping) From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Huang Hang , Lai Jiangshan Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Lai Jiangshan Drop FNAME(is_self_change_mapping) and instead rely on kvm_mmu_hugepage_adjust() to adjust the hugepage accordingly. Prior to commit 4cd071d13c5c ("KVM: x86/mmu: Move calls to thp_adjust() down a level"), the hugepage adjustment was done before allocating new shadow pages, i.e. failed to restrict the hugepage sizes if a new shadow page resulted in account_shadowed() changing the disallowed hugepage tracking. Removing FNAME(is_self_change_mapping) fixes a bug reported by Huang Hang where KVM unnecessarily forces a 4KiB page. FNAME(is_self_change_mapping) has a defect in that it blindly disables _all_ hugepage mappings rather than trying to reduce the size of the hugepage. If the guest is writing to a 1GiB page and the 1GiB is self-referential but a 2MiB page is not, then KVM can and should create a 2MiB mapping. Add a comment above the call to kvm_mmu_hugepage_adjust() to call out the new dependency on adjusting the hugepage size after walking indirect PTEs. Reported-by: Huang Hang Signed-off-by: Lai Jiangshan Link: https://lore.kernel.org/r/20221213125538.81209-1-jiangshanlai@gmail.com [sean: rework changelog after separating out the emulator change] Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/paging_tmpl.h | 51 +++++----------------------------- 1 file changed, 7 insertions(+), 44 deletions(-) diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index f57d9074fb9b..a056f2773dd9 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -690,6 +690,12 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, fault->write_fault_to_shadow_pgtable = true; } + /* + * Adjust the hugepage size _after_ resolving indirect shadow pages. + * KVM doesn't support mapping hugepages into the guest for gfns that + * are being shadowed by KVM, i.e. allocating a new shadow page may + * affect the allowed hugepage size. + */ kvm_mmu_hugepage_adjust(vcpu, fault); trace_kvm_mmu_spte_requested(fault); @@ -734,41 +740,6 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, return RET_PF_RETRY; } - /* - * To see whether the mapped gfn can write its page table in the current - * mapping. - * - * It is the helper function of FNAME(page_fault). When guest uses large page - * size to map the writable gfn which is used as current page table, we should - * force kvm to use small page size to map it because new shadow page will be - * created when kvm establishes shadow page table that stop kvm using large - * page size. Do it early can avoid unnecessary #PF and emulation. - * - * Note: the PDPT page table is not checked for PAE-32 bit guest. It is ok - * since the PDPT is always shadowed, that means, we can not use large page - * size to map the gfn which is used as PDPT. - */ -static bool -FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu, - struct guest_walker *walker, bool user_fault) -{ - int level; - gfn_t mask = ~(KVM_PAGES_PER_HPAGE(walker->level) - 1); - bool self_changed = false; - - if (!(walker->pte_access & ACC_WRITE_MASK || - (!is_cr0_wp(vcpu->arch.mmu) && !user_fault))) - return false; - - for (level = walker->level; level <= walker->max_level; level++) { - gfn_t gfn = walker->gfn ^ walker->table_gfn[level - 1]; - - self_changed |= !(gfn & mask); - } - - return self_changed; -} - /* * Page fault handler. There are several causes for a page fault: * - there is no shadow pte for the guest pte @@ -787,7 +758,6 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault { struct guest_walker walker; int r; - bool is_self_change_mapping; pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->error_code); WARN_ON_ONCE(fault->is_tdp); @@ -812,6 +782,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault } fault->gfn = walker.gfn; + fault->max_level = walker.level; fault->slot = kvm_vcpu_gfn_to_memslot(vcpu, fault->gfn); if (page_fault_handle_page_track(vcpu, fault)) { @@ -823,14 +794,6 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault if (r) return r; - is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu, - &walker, fault->user); - - if (is_self_change_mapping) - fault->max_level = PG_LEVEL_4K; - else - fault->max_level = walker.level; - r = kvm_faultin_pfn(vcpu, fault, walker.pte_access); if (r != RET_PF_CONTINUE) return r;