From patchwork Tue Jul 30 08:24:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Liju-clr Chen X-Patchwork-Id: 13747022 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E58E6C3DA49 for ; Tue, 30 Jul 2024 08:27:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:CC: To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=FXPpwQ2vjJvNjtU6FTHBdIGUQHwc9Q+WOjEwsJk40HU=; b=kHgB9e5UM/BqxJuseJYyj5Exiz EcRcHjtSMQAkSLsOD5P6BV19xef6/z7CF4/7u+WPbJ/xzM1UJCmMghDf2qAiVk1iRklR9kfF4MsDg rROaeUrDHNe/9ElwZ8Q2stf7JCvvmXtptdBZ5fLpZFrYJqa6ggSC4dKFHVqA7gT6c5VsunWe5vU54 vPsC4fZ5lnhU5HPgfNdjNI6s3Y2xNJWJR7kB4XVLKsYznzG8lxop1bTm0XgPbpo0IaCOneiy0FOW2 ru9hoV4kq4MoQjZtjt0Ry1f9EAivch39ttIeuK9XvTJLAlA9ldc5bqj5wDBhgl8YEPu7MsSyAKFmE 3oPQhAKQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sYiCC-0000000EF4w-0fvw; Tue, 30 Jul 2024 08:27:00 +0000 Received: from mailgw01.mediatek.com ([216.200.240.184]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sYiA2-0000000EE4g-24UO; Tue, 30 Jul 2024 08:24:50 +0000 X-UUID: 2bd393184e4d11efa6c87f6b4542ff6b-20240730 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=mediatek.com; s=dk; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:CC:To:From; bh=FXPpwQ2vjJvNjtU6FTHBdIGUQHwc9Q+WOjEwsJk40HU=; b=fDZSq16f7waO6FU4t/O+isLl6skYk9NY3tUGTF+mNou5qWzp+SkMOjn84O4Pa1tlDptly60Oy1KPxbtZi9nUr1avBn/APdcrAazbzfcSYg7NdX59OlitzVBk0Z7JDX6gebrs5rrvPkb17JHmqQfhgl6E2H97q/YnIsqSCNxondQ=; X-CID-P-RULE: Release_Ham X-CID-O-INFO: VERSION:1.1.41,REQID:2a2d3cd5-a156-40bf-bfc0-de557da848c7,IP:0,U RL:0,TC:0,Content:-25,EDM:-30,RT:0,SF:0,FILE:0,BULK:0,RULE:Release_Ham,ACT ION:release,TS:-55 X-CID-META: VersionHash:6dc6a47,CLOUDID:e31c11d2-436f-4604-ad9d-558fa44a3bbe,B ulkID:nil,BulkQuantity:0,Recheck:0,SF:102,TC:nil,Content:0,EDM:2,IP:nil,UR L:11|1,File:nil,RT:nil,Bulk:nil,QS:nil,BEC:nil,COL:0,OSI:0,OSA:0,AV:0,LES: 1,SPR:NO,DKR:0,DKP:0,BRR:0,BRE:0,ARC:0 X-CID-BVR: 0 X-CID-BAS: 0,_,0,_ X-CID-FACTOR: TF_CID_SPAM_SNR,TF_CID_SPAM_ULN X-UUID: 2bd393184e4d11efa6c87f6b4542ff6b-20240730 Received: from mtkmbs11n2.mediatek.inc [(172.21.101.187)] by mailgw01.mediatek.com (envelope-from ) (musrelay.mediatek.com ESMTP with TLSv1.2 ECDHE-RSA-AES256-GCM-SHA384 256/256) with ESMTP id 753441036; Tue, 30 Jul 2024 01:24:42 -0700 Received: from mtkmbs13n1.mediatek.inc (172.21.101.193) by mtkmbs10n2.mediatek.inc (172.21.101.183) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.26; Tue, 30 Jul 2024 16:24:39 +0800 Received: from mtksdccf07.mediatek.inc (172.21.84.99) by mtkmbs13n1.mediatek.inc (172.21.101.73) with Microsoft SMTP Server id 15.2.1118.26 via Frontend Transport; Tue, 30 Jul 2024 16:24:38 +0800 From: Liju-clr Chen To: Rob Herring , Krzysztof Kozlowski , Conor Dooley , Jonathan Corbet , "Catalin Marinas" , Will Deacon , "Steven Rostedt" , Masami Hiramatsu , Mathieu Desnoyers , Richard Cochran , Matthias Brugger , AngeloGioacchino Del Regno , Liju-clr Chen , Yingshiuan Pan , Ze-yu Wang CC: , , , , , , , Shawn Hsiao , PeiLun Suei , Chi-shen Yeh , Kevenny Hsieh Subject: [PATCH v12 15/24] virt: geniezone: Add memory pin/unpin support Date: Tue, 30 Jul 2024 16:24:27 +0800 Message-ID: <20240730082436.9151-16-liju-clr.chen@mediatek.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20240730082436.9151-1-liju-clr.chen@mediatek.com> References: <20240730082436.9151-1-liju-clr.chen@mediatek.com> MIME-Version: 1.0 X-TM-AS-Product-Ver: SMEX-14.0.0.3152-9.1.1006-23728.005 X-TM-AS-Result: No-10--4.665300-8.000000 X-TMASE-MatchedRID: Iv14ABKclhg/TqR73SoO885Scd0yVs+bbd6rGhWOAwQCSZrAnTS0Brps GokK+C//AuwgUdYPZkXW0e5KCL9fbo0GdWKgGbBhMIiU395I8H3QeN4A2h64ncA5YKm8dwM63AG yPNT+2THhlCpgdm2rGFijcb6u/Gs1qjvsBy5CHDsD2WXLXdz+Ae3+iQEtoSj4FLXUWU5hGiF0Pi IFfRms6CFGk2ccyuPFSJ1jQcCKXwZPDPfmo+ftx8Ed6AVHFtpjK5Mx6KzrJcNKb99LaADG+tzuz yvdSEu2w3bvXc5S63u0rE5AMK32BLSYnj+K473ZyeVujmXuYYXzWEMQjooUzQZbeEWcL03VnvoH IZ6bqqQpFI+5rucNwRcDcEF6XUZOHyzWFbk8FLXDa1qWPNOExoED+PNzPecBCAfRfqq1Gm6fWSI EJ+NgXTH5/F/NumlT/4RBY1ijhrC491OKWTeIToxVBvj1jbcj1VbeL5WocX+48McbajxWsOLSdV P2tZn5acNSxwfY1qesJ2NEVUY7679ZdlL8eonaC24oEZ6SpSmcfuxsiY4QFNdblQqooXxgBBRku ai0b1fsT05LAFRv2DD1sl5SpFm6M+hzE/YoLpmtvlgDjR7qkAIw3xAS16lq29irEiOAsG87xYCA Pzt1r2/H7LhIKjO20bWl8H0Q2RO0ML47Km+X9EYOAQXCbpqgPpCuffGH9zI= X-TM-AS-User-Approved-Sender: No X-TM-AS-User-Blocked-Sender: No X-TMASE-Result: 10--4.665300-8.000000 X-TMASE-Version: SMEX-14.0.0.3152-9.1.1006-23728.005 X-TM-SNTS-SMTP: 975F109DB836ED24E5D3523B5A918B80F9A4BF60A15647365EABD73C935CC5AA2000:8 X-MTK: N X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240730_012446_575007_C5481F3A X-CRM114-Status: GOOD ( 32.78 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Jerry Wang Protected VM's memory cannot be swapped out because the memory pages are protected from host access. Once host accesses to those protected pages, the hardware exception is triggered and may crash the host. So, we have to make those protected pages be ineligible for swapping or merging by the host kernel to avoid host access. To do so, we pin the page when it is assigned (donated) to VM and unpin when VM relinquish the pages or is destroyed. Besides, the protected VM’s memory requires hypervisor to clear the content before returning to host, but VMM may free those memory before clearing, it will result in those memory pages are reclaimed and reused before totally clearing. Using pin/unpin can also avoid the above problems. The implementation is described as follows. - Use rb_tree to store pinned memory pages. - Pin the page when handling page fault. - Unpin the pages when VM relinquish the pages or is destroyed. Signed-off-by: Jerry Wang Co-developed-by: Yingshiuan Pan Signed-off-by: Yingshiuan Pan Signed-off-by: Yi-De Wu Signed-off-by: Liju Chen --- arch/arm64/geniezone/vm.c | 8 +- drivers/virt/geniezone/Makefile | 2 +- drivers/virt/geniezone/gzvm_mmu.c | 103 ++++++++++++++++++++++++++ drivers/virt/geniezone/gzvm_vm.c | 21 ++++++ include/linux/soc/mediatek/gzvm_drv.h | 14 ++++ 5 files changed, 145 insertions(+), 3 deletions(-) create mode 100644 drivers/virt/geniezone/gzvm_mmu.c diff --git a/arch/arm64/geniezone/vm.c b/arch/arm64/geniezone/vm.c index db16716f5b8d..109e4125739a 100644 --- a/arch/arm64/geniezone/vm.c +++ b/arch/arm64/geniezone/vm.c @@ -211,12 +211,14 @@ static int gzvm_vm_ioctl_get_pvmfw_size(struct gzvm *gzvm, * @gfn: Guest frame number. * @total_pages: Total page numbers. * @slot: Pointer to struct gzvm_memslot. + * @gzvm: Pointer to struct gzvm. * * Return: how many pages we've fill in, negative if error */ static int fill_constituents(struct mem_region_addr_range *consti, int *consti_cnt, int max_nr_consti, u64 gfn, - u32 total_pages, struct gzvm_memslot *slot) + u32 total_pages, struct gzvm_memslot *slot, + struct gzvm *gzvm) { u64 pfn = 0, prev_pfn = 0, gfn_end = 0; int nr_pages = 0; @@ -227,6 +229,8 @@ static int fill_constituents(struct mem_region_addr_range *consti, gfn_end = gfn + total_pages; while (i < max_nr_consti && gfn < gfn_end) { + if (gzvm_vm_allocate_guest_page(gzvm, slot, gfn, &pfn) != 0) + return -EFAULT; if (pfn == (prev_pfn + 1)) { consti[i].pg_cnt++; } else { @@ -282,7 +286,7 @@ int gzvm_vm_populate_mem_region(struct gzvm *gzvm, int slot_id) nr_pages = fill_constituents(region->constituents, ®ion->constituent_cnt, max_nr_consti, gfn, - remain_pages, memslot); + remain_pages, memslot, gzvm); if (nr_pages < 0) { pr_err("Failed to fill constituents\n"); diff --git a/drivers/virt/geniezone/Makefile b/drivers/virt/geniezone/Makefile index bc5ae49f2407..e0451145215d 100644 --- a/drivers/virt/geniezone/Makefile +++ b/drivers/virt/geniezone/Makefile @@ -8,4 +8,4 @@ GZVM_DIR ?= ../../../drivers/virt/geniezone gzvm-y := $(GZVM_DIR)/gzvm_main.o $(GZVM_DIR)/gzvm_vm.o \ $(GZVM_DIR)/gzvm_vcpu.o $(GZVM_DIR)/gzvm_irqfd.o \ - $(GZVM_DIR)/gzvm_ioeventfd.o + $(GZVM_DIR)/gzvm_ioeventfd.o $(GZVM_DIR)/gzvm_mmu.o diff --git a/drivers/virt/geniezone/gzvm_mmu.c b/drivers/virt/geniezone/gzvm_mmu.c new file mode 100644 index 000000000000..743df8976dfd --- /dev/null +++ b/drivers/virt/geniezone/gzvm_mmu.c @@ -0,0 +1,103 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2023 MediaTek Inc. + */ + +#include + +static int cmp_ppages(struct rb_node *node, const struct rb_node *parent) +{ + struct gzvm_pinned_page *a = container_of(node, + struct gzvm_pinned_page, + node); + struct gzvm_pinned_page *b = container_of(parent, + struct gzvm_pinned_page, + node); + + if (a->ipa < b->ipa) + return -1; + if (a->ipa > b->ipa) + return 1; + return 0; +} + +/* Invoker of this function is responsible for locking */ +static int gzvm_insert_ppage(struct gzvm *vm, struct gzvm_pinned_page *ppage) +{ + if (rb_find_add(&ppage->node, &vm->pinned_pages, cmp_ppages)) + return -EEXIST; + return 0; +} + +static int pin_one_page(struct gzvm *vm, unsigned long hva, u64 gpa, + struct page **out_page) +{ + unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE; + struct gzvm_pinned_page *ppage = NULL; + struct mm_struct *mm = current->mm; + struct page *page = NULL; + int ret; + + ppage = kmalloc(sizeof(*ppage), GFP_KERNEL_ACCOUNT); + if (!ppage) + return -ENOMEM; + + mmap_read_lock(mm); + ret = pin_user_pages(hva, 1, flags, &page); + mmap_read_unlock(mm); + + if (ret != 1 || !page) { + kfree(ppage); + return -EFAULT; + } + + ppage->page = page; + ppage->ipa = gpa; + + mutex_lock(&vm->mem_lock); + ret = gzvm_insert_ppage(vm, ppage); + + /** + * The return of -EEXIST from gzvm_insert_ppage is considered an + * expected behavior in this context. + * This situation arises when two or more VCPUs are concurrently + * engaged in demand paging handling. The initial VCPU has already + * allocated and pinned a page, while the subsequent VCPU attempts + * to pin the same page again. As a result, we prompt the unpinning + * and release of the allocated structure, followed by a return 0. + */ + if (ret == -EEXIST) { + kfree(ppage); + unpin_user_pages(&page, 1); + ret = 0; + } + mutex_unlock(&vm->mem_lock); + *out_page = page; + + return ret; +} + +int gzvm_vm_allocate_guest_page(struct gzvm *vm, struct gzvm_memslot *slot, + u64 gfn, u64 *pfn) +{ + struct page *page = NULL; + unsigned long hva; + int ret; + + if (gzvm_gfn_to_hva_memslot(slot, gfn, (u64 *)&hva) != 0) + return -EINVAL; + + ret = pin_one_page(vm, hva, PFN_PHYS(gfn), &page); + if (ret != 0) + return ret; + + if (page == NULL) + return -EFAULT; + /** + * As `pin_user_pages` already gets the page struct, we don't need to + * call other APIs to reduce function call overhead. + */ + *pfn = page_to_pfn(page); + + return 0; +} diff --git a/drivers/virt/geniezone/gzvm_vm.c b/drivers/virt/geniezone/gzvm_vm.c index 5ae5abff2955..b4a68ba905b1 100644 --- a/drivers/virt/geniezone/gzvm_vm.c +++ b/drivers/virt/geniezone/gzvm_vm.c @@ -298,6 +298,22 @@ static long gzvm_vm_ioctl(struct file *filp, unsigned int ioctl, return ret; } +/* Invoker of this function is responsible for locking */ +static void gzvm_destroy_all_ppage(struct gzvm *gzvm) +{ + struct gzvm_pinned_page *ppage; + struct rb_node *node; + + node = rb_first(&gzvm->pinned_pages); + while (node) { + ppage = rb_entry(node, struct gzvm_pinned_page, node); + unpin_user_pages_dirty_lock(&ppage->page, 1, true); + node = rb_next(node); + rb_erase(&ppage->node, &gzvm->pinned_pages); + kfree(ppage); + } +} + static void gzvm_destroy_vm(struct gzvm *gzvm) { pr_debug("VM-%u is going to be destroyed\n", gzvm->vm_id); @@ -314,6 +330,9 @@ static void gzvm_destroy_vm(struct gzvm *gzvm) mutex_unlock(&gzvm->lock); + /* No need to lock here becauese it's single-threaded execution */ + gzvm_destroy_all_ppage(gzvm); + kfree(gzvm); } @@ -349,6 +368,8 @@ static struct gzvm *gzvm_create_vm(unsigned long vm_type) gzvm->vm_id = ret; gzvm->mm = current->mm; mutex_init(&gzvm->lock); + mutex_init(&gzvm->mem_lock); + gzvm->pinned_pages = RB_ROOT; ret = gzvm_vm_irqfd_init(gzvm); if (ret) { diff --git a/include/linux/soc/mediatek/gzvm_drv.h b/include/linux/soc/mediatek/gzvm_drv.h index 54ac91670611..7d2f0b07ad84 100644 --- a/include/linux/soc/mediatek/gzvm_drv.h +++ b/include/linux/soc/mediatek/gzvm_drv.h @@ -12,6 +12,7 @@ #include #include #include +#include /* * For the normal physical address, the highest 12 bits should be zero, so we @@ -97,6 +98,12 @@ struct gzvm_vcpu { struct gzvm_vcpu_hwstate *hwstate; }; +struct gzvm_pinned_page { + struct rb_node node; + struct page *page; + u64 ipa; +}; + /** * struct gzvm: the following data structures are for data transferring between * driver and hypervisor, and they're aligned with hypervisor definitions. @@ -112,6 +119,8 @@ struct gzvm_vcpu { * @irq_ack_notifier_list: list head for irq ack notifier * @irq_srcu: structure data for SRCU(sleepable rcu) * @irq_lock: lock for irq injection + * @pinned_pages: use rb-tree to record pin/unpin page + * @mem_lock: lock for memory operations */ struct gzvm { struct gzvm_vcpu *vcpus[GZVM_MAX_VCPUS]; @@ -135,6 +144,9 @@ struct gzvm { struct hlist_head irq_ack_notifier_list; struct srcu_struct irq_srcu; struct mutex irq_lock; + + struct rb_root pinned_pages; + struct mutex mem_lock; }; long gzvm_dev_ioctl_check_extension(struct gzvm *gzvm, unsigned long args); @@ -160,6 +172,8 @@ int gzvm_vm_ioctl_arch_enable_cap(struct gzvm *gzvm, int gzvm_gfn_to_hva_memslot(struct gzvm_memslot *memslot, u64 gfn, u64 *hva_memslot); int gzvm_vm_populate_mem_region(struct gzvm *gzvm, int slot_id); +int gzvm_vm_allocate_guest_page(struct gzvm *gzvm, struct gzvm_memslot *slot, + u64 gfn, u64 *pfn); int gzvm_vm_ioctl_create_vcpu(struct gzvm *gzvm, u32 cpuid); int gzvm_arch_vcpu_update_one_reg(struct gzvm_vcpu *vcpu, __u64 reg_id,