From patchwork Wed Nov 15 17:16:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Ene X-Patchwork-Id: 13457133 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 54B9AC2BB3F for ; Wed, 15 Nov 2023 17:17:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Message-ID: References:Mime-Version:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=AfEmyTHi0yHJ4l4Yvhuf75Dq4ghWZq5R7IuJhLpcEuI=; b=ZzRGCSYeCllAKvv/RNQfSP82Iz 1nwRKJkQeX6caNX/qCICOa93Zotvok8QpuIZ9ON7yqpMPShm5iZ4D48nNYZSx6OihCDDyqx3sw8Bq HGhfKVHw2CupiOe8ckqllJtBXqrB6Y5FwIdsNirTZ6gZZHBKKf5kA0qX7t+HOtDe9Itww1DWIcsuq aKzKAQfGHStuHSth//wZNVKpgYfia6QAi3bAh3sbcBQ9+8vh0lvV+v6OvZNV2WHL7eHPkrmqOcLYk JKkyZeh6JoW1awYGmBJWYVc7SBgczocB+yPCkNjefewr9+sZIEl2C85aUrs7G4tmpSOvXWrs9KE5z 8Yiqww7Q==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1r3JVc-001PRL-00; Wed, 15 Nov 2023 17:17:00 +0000 Received: from mail-yb1-xb4a.google.com ([2607:f8b0:4864:20::b4a]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1r3JVZ-001PQM-03 for linux-arm-kernel@lists.infradead.org; Wed, 15 Nov 2023 17:16:58 +0000 Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-da0ccfc4fc8so8898721276.2 for ; Wed, 15 Nov 2023 09:16:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1700068616; x=1700673416; darn=lists.infradead.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=f5DtLWt2DkvfdlATtfwZgDQwlHPVgdWgrcJSLVV+crY=; b=wKtCFIUKtYvYqWHSBpUephyJTvlnglTqAeZNIyZLMaHHPZpyh6/cd6IclxoiPjDueR 8EQeI7ek9Hboxjkr1Jnq+ndR/gT2lj4uWbKdtTXZR36r226x4xARoCT1cEps7eg9uqzb 8VjRDhAfD9wywFtFrKrP9YFoHd3Yn86h5nfIzk95NiJxwQZAVy+KJwWODEHFNYHi7IuR kuaAmlQcdkfbW2dcdbdVMCVEOq0AUOl8hcWuURvwAk67Ef6wUqnWhH3Ax3iMrUfE+JNJ x7DkfXYCUlbws6+ob5u1ifFWuD8/URVvzUbT8rIJ9HpRwwFtAO3BCp7wJYUILIst3xBR eHlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700068616; x=1700673416; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=f5DtLWt2DkvfdlATtfwZgDQwlHPVgdWgrcJSLVV+crY=; b=FaW+NfL12LLJwsJYaGZ7sGn2+vNAXAn1cb80dBOznb/djj2Ewgw3iEMKgbycr1MMwz n2XMNO0Zu3uWHryu6QAE0+sVeM3cq8f0DWD4XJ9weTXg5Da0Ex54ATVOiAtW7gFpZTlj oL6Uxw1ELl2nNV+p1wSumkYFc+YdLTAUETMMVVTrDoHxVN7KWffJlcjVzzGDW7wGrxNT JEO8CPiN8NXfoQwFJ8j9m0N+0UNeDgVkJv+Aokk+tPJxsxlp10PD3OcNE1uwb60243Gq I10ta6DzuF2iVBN9sHP4hVyARtmOtE2yOBlbqLJkWQeAkonsKQDmBq439TXxt4b9EtJm b7Zw== X-Gm-Message-State: AOJu0YzY5OtxS3C13QMLm0FW6yXGi/LTkyj73hXgRZ0SqfvhhW27yToU M8hW/autc3YTNA9HwRAUoqXsMRHW6lYULGvr8iM= X-Google-Smtp-Source: AGHT+IFPAdfzxoe3HTWk2Br3fnX/ejb8TGG8XL881PzRrCeQNX6maVryFyEGTW9OnwypADg+2RbHapnA2CGQUO1chO0= X-Received: from sebkvm.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:cd5]) (user=sebastianene job=sendgmr) by 2002:a05:6902:182:b0:d9a:ec95:9687 with SMTP id t2-20020a056902018200b00d9aec959687mr332837ybh.11.1700068615713; Wed, 15 Nov 2023 09:16:55 -0800 (PST) Date: Wed, 15 Nov 2023 17:16:31 +0000 In-Reply-To: <20231115171639.2852644-2-sebastianene@google.com> Mime-Version: 1.0 References: <20231115171639.2852644-2-sebastianene@google.com> X-Mailer: git-send-email 2.43.0.rc0.421.g78406f8d94-goog Message-ID: <20231115171639.2852644-3-sebastianene@google.com> Subject: [PATCH v3 01/10] KVM: arm64: Add snap shooting the host stage-2 pagetables From: Sebastian Ene To: will@kernel.org, Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , catalin.marinas@arm.com, mark.rutland@arm.com, akpm@linux-foundation.org, maz@kernel.org Cc: kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kernel-team@android.com, vdonnefort@google.com, qperret@google.com, smostafa@google.com, Sebastian Ene X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20231115_091657_054986_7C01C896 X-CRM114-Status: GOOD ( 22.72 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Introduce a new HVC that allows the caller to snap shoot the stage-2 pagetables under NVHE debug configuration. The caller specifies the location where the pagetables are copied and must ensure that the memory is accessible by the hypervisor. The memory where the pagetables are copied has to be allocated by the caller and shared with the hypervisor. Signed-off-by: Sebastian Ene --- arch/arm64/include/asm/kvm_asm.h | 1 + arch/arm64/include/asm/kvm_pgtable.h | 36 +++++++ arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 + arch/arm64/kvm/hyp/nvhe/hyp-main.c | 20 ++++ arch/arm64/kvm/hyp/nvhe/mem_protect.c | 102 ++++++++++++++++++ arch/arm64/kvm/hyp/pgtable.c | 56 ++++++++++ 6 files changed, 216 insertions(+) diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h index 24b5e6b23417..99145a24c0f6 100644 --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -81,6 +81,7 @@ enum __kvm_host_smccc_func { __KVM_HOST_SMCCC_FUNC___pkvm_init_vm, __KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu, __KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm, + __KVM_HOST_SMCCC_FUNC___pkvm_copy_host_stage2, }; #define DECLARE_KVM_VHE_SYM(sym) extern char sym[] diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h index d3e354bb8351..be615700f8ac 100644 --- a/arch/arm64/include/asm/kvm_pgtable.h +++ b/arch/arm64/include/asm/kvm_pgtable.h @@ -10,6 +10,7 @@ #include #include #include +#include #define KVM_PGTABLE_MAX_LEVELS 4U @@ -351,6 +352,21 @@ struct kvm_pgtable { kvm_pgtable_force_pte_cb_t force_pte_cb; }; +/** + * struct kvm_pgtable_snapshot - Snapshot page-table wrapper. + * @pgtable: The page-table configuration. + * @mc: Memcache used for pagetable pages allocation. + * @pgd_hva: Host virtual address of a physically contiguous buffer + * used for storing the PGD. + * @pgd_len: The size of the phyisically contiguous buffer in bytes. + */ +struct kvm_pgtable_snapshot { + struct kvm_pgtable pgtable; + struct kvm_hyp_memcache mc; + void *pgd_hva; + size_t pgd_len; +}; + /** * kvm_pgtable_hyp_init() - Initialise a hypervisor stage-1 page-table. * @pgt: Uninitialised page-table structure to initialise. @@ -756,4 +772,24 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte); */ void kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, size_t size); + +#ifdef CONFIG_NVHE_EL2_DEBUG +/** + * kvm_pgtable_stage2_copy() - Snapshot the pagetable + * + * @to_pgt: Destination pagetable + * @from_pgt: Source pagetable. The caller must lock the pagetables first + * @mc: The memcache where we allocate the destination pagetables from + */ +int kvm_pgtable_stage2_copy(struct kvm_pgtable *to_pgt, + const struct kvm_pgtable *from_pgt, + void *mc); +#else +static inline int kvm_pgtable_stage2_copy(struct kvm_pgtable *to_pgt, + const struct kvm_pgtable *from_pgt, + void *mc) +{ + return -EPERM; +} +#endif /* CONFIG_NVHE_EL2_DEBUG */ #endif /* __ARM64_KVM_PGTABLE_H__ */ diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h index 0972faccc2af..9cfb35d68850 100644 --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h @@ -69,6 +69,7 @@ int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages); int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages); int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages); int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages); +int __pkvm_host_stage2_prepare_copy(struct kvm_pgtable_snapshot *snapshot); bool addr_is_memory(phys_addr_t phys); int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot); diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c index 2385fd03ed87..98646cc67497 100644 --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c @@ -314,6 +314,25 @@ static void handle___pkvm_teardown_vm(struct kvm_cpu_context *host_ctxt) cpu_reg(host_ctxt, 1) = __pkvm_teardown_vm(handle); } +static void handle___pkvm_copy_host_stage2(struct kvm_cpu_context *host_ctxt) +{ +#ifdef CONFIG_NVHE_EL2_DEBUG + int ret = -EPERM; + DECLARE_REG(struct kvm_pgtable_snapshot *, snapshot, host_ctxt, 1); + kvm_pteref_t pgd; + + snapshot = kern_hyp_va(snapshot); + ret = __pkvm_host_stage2_prepare_copy(snapshot); + if (!ret) { + pgd = snapshot->pgtable.pgd; + snapshot->pgtable.pgd = (kvm_pteref_t)__hyp_pa(pgd); + } + cpu_reg(host_ctxt, 1) = ret; +#else + cpu_reg(host_ctxt, 0) = SMCCC_RET_NOT_SUPPORTED; +#endif +} + typedef void (*hcall_t)(struct kvm_cpu_context *); #define HANDLE_FUNC(x) [__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x @@ -348,6 +367,7 @@ static const hcall_t host_hcall[] = { HANDLE_FUNC(__pkvm_init_vm), HANDLE_FUNC(__pkvm_init_vcpu), HANDLE_FUNC(__pkvm_teardown_vm), + HANDLE_FUNC(__pkvm_copy_host_stage2), }; static void handle_host_hcall(struct kvm_cpu_context *host_ctxt) diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c index 8d0a5834e883..1c3ab5ac9110 100644 --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c @@ -266,6 +266,108 @@ int kvm_guest_prepare_stage2(struct pkvm_hyp_vm *vm, void *pgd) return 0; } +#ifdef CONFIG_NVHE_EL2_DEBUG +static struct hyp_pool snapshot_pool = {0}; +static DEFINE_HYP_SPINLOCK(snapshot_pool_lock); + +static void *snapshot_zalloc_pages_exact(size_t size) +{ + void *addr = hyp_alloc_pages(&snapshot_pool, get_order(size)); + + hyp_split_page(hyp_virt_to_page(addr)); + + /* + * The size of concatenated PGDs is always a power of two of PAGE_SIZE, + * so there should be no need to free any of the tail pages to make the + * allocation exact. + */ + WARN_ON(size != (PAGE_SIZE << get_order(size))); + + return addr; +} + +static void snapshot_get_page(void *addr) +{ + hyp_get_page(&snapshot_pool, addr); +} + +static void *snapshot_zalloc_page(void *mc) +{ + struct hyp_page *p; + void *addr; + + addr = hyp_alloc_pages(&snapshot_pool, 0); + if (addr) + return addr; + + addr = pop_hyp_memcache(mc, hyp_phys_to_virt); + if (!addr) + return addr; + + memset(addr, 0, PAGE_SIZE); + p = hyp_virt_to_page(addr); + memset(p, 0, sizeof(*p)); + p->refcount = 1; + + return addr; +} + +static void snapshot_s2_free_pages_exact(void *addr, unsigned long size) +{ + u8 order = get_order(size); + unsigned int i; + struct hyp_page *p; + + for (i = 0; i < (1 << order); i++) { + p = hyp_virt_to_page(addr + (i * PAGE_SIZE)); + hyp_page_ref_dec_and_test(p); + } +} + +int __pkvm_host_stage2_prepare_copy(struct kvm_pgtable_snapshot *snapshot) +{ + size_t required_pgd_len; + struct kvm_pgtable_mm_ops mm_ops = {0}; + struct kvm_s2_mmu *mmu = &host_mmu.arch.mmu; + struct kvm_pgtable *to_pgt, *from_pgt = &host_mmu.pgt; + struct kvm_hyp_memcache *memcache = &snapshot->mc; + int ret; + void *pgd; + + required_pgd_len = kvm_pgtable_stage2_pgd_size(mmu->vtcr); + if (snapshot->pgd_len < required_pgd_len) + return -ENOMEM; + + to_pgt = &snapshot->pgtable; + pgd = kern_hyp_va(snapshot->pgd_hva); + + hyp_spin_lock(&snapshot_pool_lock); + hyp_pool_init(&snapshot_pool, hyp_virt_to_pfn(pgd), + required_pgd_len / PAGE_SIZE, 0); + + mm_ops.zalloc_pages_exact = snapshot_zalloc_pages_exact; + mm_ops.zalloc_page = snapshot_zalloc_page; + mm_ops.free_pages_exact = snapshot_s2_free_pages_exact; + mm_ops.get_page = snapshot_get_page; + mm_ops.phys_to_virt = hyp_phys_to_virt; + mm_ops.virt_to_phys = hyp_virt_to_phys; + mm_ops.page_count = hyp_page_count; + + to_pgt->ia_bits = from_pgt->ia_bits; + to_pgt->start_level = from_pgt->start_level; + to_pgt->flags = from_pgt->flags; + to_pgt->mm_ops = &mm_ops; + + host_lock_component(); + ret = kvm_pgtable_stage2_copy(to_pgt, from_pgt, memcache); + host_unlock_component(); + + hyp_spin_unlock(&snapshot_pool_lock); + + return ret; +} +#endif /* CONFIG_NVHE_EL2_DEBUG */ + void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc) { void *addr; diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index 1966fdee740e..46b15d74118f 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -1598,3 +1598,59 @@ void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *p WARN_ON(mm_ops->page_count(pgtable) != 1); mm_ops->put_page(pgtable); } + +#ifdef CONFIG_NVHE_EL2_DEBUG +static int stage2_copy_walker(const struct kvm_pgtable_visit_ctx *ctx, + enum kvm_pgtable_walk_flags visit) +{ + struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops; + void *copy_table, *original_addr; + kvm_pte_t new = ctx->old; + + if (!stage2_pte_is_counted(ctx->old)) + return 0; + + if (kvm_pte_table(ctx->old, ctx->level)) { + copy_table = mm_ops->zalloc_page(ctx->arg); + if (!copy_table) + return -ENOMEM; + + original_addr = kvm_pte_follow(ctx->old, mm_ops); + + memcpy(copy_table, original_addr, PAGE_SIZE); + new = kvm_init_table_pte(copy_table, mm_ops); + } + + *ctx->ptep = new; + + return 0; +} + +int kvm_pgtable_stage2_copy(struct kvm_pgtable *to_pgt, + const struct kvm_pgtable *from_pgt, + void *mc) +{ + int ret; + size_t pgd_sz; + struct kvm_pgtable_mm_ops *mm_ops = to_pgt->mm_ops; + struct kvm_pgtable_walker walker = { + .cb = stage2_copy_walker, + .flags = KVM_PGTABLE_WALK_LEAF | + KVM_PGTABLE_WALK_TABLE_PRE, + .arg = mc + }; + + pgd_sz = kvm_pgd_pages(to_pgt->ia_bits, to_pgt->start_level) * + PAGE_SIZE; + to_pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_pages_exact(pgd_sz); + if (!to_pgt->pgd) + return -ENOMEM; + + memcpy(to_pgt->pgd, from_pgt->pgd, pgd_sz); + + ret = kvm_pgtable_walk(to_pgt, 0, BIT(to_pgt->ia_bits), &walker); + mm_ops->free_pages_exact(to_pgt->pgd, pgd_sz); + + return ret; +} +#endif /* CONFIG_NVHE_EL2_DEBUG */