From patchwork Fri May 13 20:28:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Matlack X-Patchwork-Id: 12849413 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C93A1C433FE for ; Fri, 13 May 2022 20:28:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1384187AbiEMU2h (ORCPT ); Fri, 13 May 2022 16:28:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58386 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1384206AbiEMU2g (ORCPT ); Fri, 13 May 2022 16:28:36 -0400 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B9D6717E0E for ; Fri, 13 May 2022 13:28:34 -0700 (PDT) Received: by mail-pj1-x1049.google.com with SMTP id r16-20020a17090b051000b001db302efed7so4845362pjz.2 for ; Fri, 13 May 2022 13:28:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=pgDEIvE866n0o/BO45PlwG3Ec/tBHSaPSgThM0YfR5s=; b=IVlToeYqV9KmMEnWI3N6sq1ln/1Dw7FywPeoGsKh1RIhTgxViFYaL9t0rP63nB7ySs MOHvv05m+k2OrWenuW5F7eLRRNMoKBDk685pgdPKQl1xhr9SXVk5KHu/yJd8IhSzg932 9X4FVoYGMhu0XGm6jJsQbZhyol7GPzB0IQufFFy4Cg2CjkDYcmMvvGEkTKJ4IrOpQyAM dvHNx6gyWx4d5QusgdQ7ZGEZwEI6cos/PNdTXd19udLGvf523sWYpoJnRsXoLkEnmVYV e/1/ykmNY3ruoF6HyuUeOxJ8EM64QlLW4EfwNYSPtx+46g1XQ7aGJRHvDLFecoaoXSjK BPtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=pgDEIvE866n0o/BO45PlwG3Ec/tBHSaPSgThM0YfR5s=; b=ylI6nWQJd0E9uxePV5rBCuU8JyAdge9HWfClP04Ei7yce58fFojst7HfVnCW2NN5J9 Qj+Fm2QTGVInnIuzeZ5H2JWkYc4Hivk/Au4iiGau+Njo0FdubHD2g2b59hUSR5Etp7oY JkSfNoepj2lew6U++NI05+opvh4ZTUjmfcEAYoXzSRRUnYeTLtC+S/BPWk46+TJpCckZ G7MSdyaaHgsW72Hkn1ZPxsG/hoNFjJiwpQT7Hc2+zrk1p6mhYYVT3HU0W7CYfJ2O7b9i Su1GEgSeBZh4LHK5aNRhmBYf6nBtA6MtrcM8sD3BH2MUuAehhsQ75Iv4h5WFCdPFfS5v wK2g== X-Gm-Message-State: AOAM533SANxVdT7KQu7SBm5MjsheJS023ZcC7LRFMWf9ZP0Hdbejf2lu PLqRsEaG6bisJxFvGTBvqXW3zogqaEnncg== X-Google-Smtp-Source: ABdhPJxxOcGSAhPEhB2ArKuzNddojx8ZJtiAE8cELlCZr6dmvCLtF/ySB/ByD6inh3CWz9NslbwtIUi/a3Y5sg== X-Received: from dmatlack-heavy.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:19cd]) (user=dmatlack job=sendgmr) by 2002:a17:90a:8d83:b0:1dd:258c:7c55 with SMTP id d3-20020a17090a8d8300b001dd258c7c55mr250778pjo.1.1652473712601; Fri, 13 May 2022 13:28:32 -0700 (PDT) Date: Fri, 13 May 2022 20:28:01 +0000 In-Reply-To: <20220513202819.829591-1-dmatlack@google.com> Message-Id: <20220513202819.829591-4-dmatlack@google.com> Mime-Version: 1.0 References: <20220513202819.829591-1-dmatlack@google.com> X-Mailer: git-send-email 2.36.0.550.gb090851708-goog Subject: [PATCH v5 03/21] KVM: x86/mmu: Derive shadow MMU page role from parent From: David Matlack To: Paolo Bonzini Cc: Marc Zyngier , Huacai Chen , Aleksandar Markovic , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Sean Christopherson , Andrew Jones , Ben Gardon , Peter Xu , maciej.szmigiero@oracle.com, "moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , "open list:KERNEL VIRTUAL MACHINE FOR MIPS (KVM/mips)" , "open list:KERNEL VIRTUAL MACHINE FOR MIPS (KVM/mips)" , "open list:KERNEL VIRTUAL MACHINE FOR RISC-V (KVM/riscv)" , Peter Feiner , Lai Jiangshan , David Matlack Precedence: bulk List-ID: X-Mailing-List: linux-mips@vger.kernel.org Instead of computing the shadow page role from scratch for every new page, derive most of the information from the parent shadow page. This eliminates the dependency on the vCPU root role to allocate shadow page tables, and reduces the number of parameters to kvm_mmu_get_page(). Preemptively split out the role calculation to a separate function for use in a following commit. No functional change intended. Reviewed-by: Peter Xu Signed-off-by: David Matlack --- arch/x86/kvm/mmu/mmu.c | 103 +++++++++++++++++++++++---------- arch/x86/kvm/mmu/paging_tmpl.h | 9 +-- 2 files changed, 77 insertions(+), 35 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 34fb0cddff2b..a927a062a598 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2019,33 +2019,15 @@ static void clear_sp_write_flooding_count(u64 *spte) __clear_sp_write_flooding_count(sptep_to_sp(spte)); } -static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, - gfn_t gfn, - gva_t gaddr, - unsigned level, - bool direct, - unsigned int access) +static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, gfn_t gfn, + union kvm_mmu_page_role role) { - union kvm_mmu_page_role role; struct hlist_head *sp_list; - unsigned quadrant; struct kvm_mmu_page *sp; int ret; int collisions = 0; LIST_HEAD(invalid_list); - role = vcpu->arch.mmu->root_role; - role.level = level; - role.direct = direct; - role.access = access; - if (role.has_4_byte_gpte) { - quadrant = gaddr >> (PAGE_SHIFT + (PT64_PT_BITS * level)); - quadrant &= (1 << ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1; - role.quadrant = quadrant; - } - if (level <= vcpu->arch.mmu->cpu_role.base.level) - role.passthrough = 0; - sp_list = &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]; for_each_valid_sp(vcpu->kvm, sp, sp_list) { if (sp->gfn != gfn) { @@ -2063,7 +2045,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, * Unsync pages must not be left as is, because the new * upper-level page will be write-protected. */ - if (level > PG_LEVEL_4K && sp->unsync) + if (role.level > PG_LEVEL_4K && sp->unsync) kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list); continue; @@ -2104,14 +2086,14 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, ++vcpu->kvm->stat.mmu_cache_miss; - sp = kvm_mmu_alloc_page(vcpu, direct); + sp = kvm_mmu_alloc_page(vcpu, role.direct); sp->gfn = gfn; sp->role = role; hlist_add_head(&sp->hash_link, sp_list); if (sp_has_gptes(sp)) { account_shadowed(vcpu->kvm, sp); - if (level == PG_LEVEL_4K && kvm_vcpu_write_protect_gfn(vcpu, gfn)) + if (role.level == PG_LEVEL_4K && kvm_vcpu_write_protect_gfn(vcpu, gfn)) kvm_flush_remote_tlbs_with_address(vcpu->kvm, gfn, 1); } trace_kvm_mmu_get_page(sp, true); @@ -2123,6 +2105,55 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, return sp; } +static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct, u32 access) +{ + struct kvm_mmu_page *parent_sp = sptep_to_sp(sptep); + union kvm_mmu_page_role role; + + role = parent_sp->role; + role.level--; + role.access = access; + role.direct = direct; + role.passthrough = 0; + + /* + * If the guest has 4-byte PTEs then that means it's using 32-bit, + * 2-level, non-PAE paging. KVM shadows such guests with PAE paging + * (i.e. 8-byte PTEs). The difference in PTE size means that KVM must + * shadow each guest page table with multiple shadow page tables, which + * requires extra bookkeeping in the role. + * + * Specifically, to shadow the guest's page directory (which covers a + * 4GiB address space), KVM uses 4 PAE page directories, each mapping + * 1GiB of the address space. @role.quadrant encodes which quarter of + * the address space each maps. + * + * To shadow the guest's page tables (which each map a 4MiB region), KVM + * uses 2 PAE page tables, each mapping a 2MiB region. For these, + * @role.quadrant encodes which half of the region they map. + * + * Note, the 4 PAE page directories are pre-allocated and the quadrant + * assigned in mmu_alloc_root(). So only page tables need to be handled + * here. + */ + if (role.has_4_byte_gpte) { + WARN_ON_ONCE(role.level != PG_LEVEL_4K); + role.quadrant = (sptep - parent_sp->spt) % 2; + } + + return role; +} + +static struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu, + u64 *sptep, gfn_t gfn, + bool direct, u32 access) +{ + union kvm_mmu_page_role role; + + role = kvm_mmu_child_role(sptep, direct, access); + return kvm_mmu_get_page(vcpu, gfn, role); +} + static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterator, struct kvm_vcpu *vcpu, hpa_t root, u64 addr) @@ -2965,8 +2996,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) if (is_shadow_present_pte(*it.sptep)) continue; - sp = kvm_mmu_get_page(vcpu, base_gfn, it.addr, - it.level - 1, true, ACC_ALL); + sp = kvm_mmu_get_child_sp(vcpu, it.sptep, base_gfn, true, ACC_ALL); link_shadow_page(vcpu, it.sptep, sp); if (fault->is_tdp && fault->huge_page_disallowed && @@ -3369,12 +3399,24 @@ static int mmu_check_root(struct kvm_vcpu *vcpu, gfn_t root_gfn) return ret; } -static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, gva_t gva, +static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant, u8 level, bool direct) { + union kvm_mmu_page_role role; struct kvm_mmu_page *sp; - sp = kvm_mmu_get_page(vcpu, gfn, gva, level, direct, ACC_ALL); + role = vcpu->arch.mmu->root_role; + role.level = level; + role.direct = direct; + role.access = ACC_ALL; + + if (role.has_4_byte_gpte) + role.quadrant = quadrant; + + if (level <= vcpu->arch.mmu->cpu_role.base.level) + role.passthrough = 0; + + sp = kvm_mmu_get_page(vcpu, gfn, role); ++sp->root_count; return __pa(sp->spt); @@ -3408,8 +3450,8 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) for (i = 0; i < 4; ++i) { WARN_ON_ONCE(IS_VALID_PAE_ROOT(mmu->pae_root[i])); - root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT), - i << 30, PT32_ROOT_LEVEL, true); + root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT), i, + PT32_ROOT_LEVEL, true); mmu->pae_root[i] = root | PT_PRESENT_MASK | shadow_me_mask; } @@ -3578,8 +3620,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) root_gfn = pdptrs[i] >> PAGE_SHIFT; } - root = mmu_alloc_root(vcpu, root_gfn, i << 30, - PT32_ROOT_LEVEL, false); + root = mmu_alloc_root(vcpu, root_gfn, i, PT32_ROOT_LEVEL, false); mmu->pae_root[i] = root | pm_mask; } diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index db80f7ccaa4e..fd73c857af90 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -648,8 +648,9 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, if (!is_shadow_present_pte(*it.sptep)) { table_gfn = gw->table_gfn[it.level - 2]; access = gw->pt_access[it.level - 2]; - sp = kvm_mmu_get_page(vcpu, table_gfn, fault->addr, - it.level-1, false, access); + sp = kvm_mmu_get_child_sp(vcpu, it.sptep, table_gfn, + false, access); + /* * We must synchronize the pagetable before linking it * because the guest doesn't need to flush tlb when @@ -705,8 +706,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, drop_large_spte(vcpu, it.sptep); if (!is_shadow_present_pte(*it.sptep)) { - sp = kvm_mmu_get_page(vcpu, base_gfn, fault->addr, - it.level - 1, true, direct_access); + sp = kvm_mmu_get_child_sp(vcpu, it.sptep, base_gfn, + true, direct_access); link_shadow_page(vcpu, it.sptep, sp); if (fault->huge_page_disallowed && fault->req_level >= it.level)