From patchwork Thu Mar 30 11:49:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 13194074 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1DE1C77B6F for ; Thu, 30 Mar 2023 11:50:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8F99D6B0082; Thu, 30 Mar 2023 07:50:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 88A436B0085; Thu, 30 Mar 2023 07:50:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6B2A36B0083; Thu, 30 Mar 2023 07:50:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 5C5486B0081 for ; Thu, 30 Mar 2023 07:50:39 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 3B71B8095B for ; Thu, 30 Mar 2023 11:50:39 +0000 (UTC) X-FDA: 80625397398.10.4B3711A Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf11.hostedemail.com (Postfix) with ESMTP id 0320A40010 for ; Thu, 30 Mar 2023 11:50:36 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Fh6KoGXF; spf=none (imf11.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 192.55.52.120) smtp.mailfrom=kirill.shutemov@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680177037; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CHDV823rZOUn0+ZJELanQGogydDmakICM2zRSlXwzXs=; b=C0OQv3oHkB/jPuQJMU9GOMKJTv24lpUto2I1tG93r9xPglqLR0cluvMUgSYB4erR1CB360 oVfDI81e5yQGgMmoE/IQ/Webzs4QeDlNNVkkQwvR8zQ8sJWNKd6ao5I4nUXT+9SQyO8KKM GZNh/xN2FkKeDDw0QsIilYA2YI4c2TY= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Fh6KoGXF; spf=none (imf11.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 192.55.52.120) smtp.mailfrom=kirill.shutemov@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680177037; a=rsa-sha256; cv=none; b=X9WEtvrLv+MsFPAtY55Mjdoe667uy8OGRDLbaldmVKrCi3lFCIZ+ovB7mmmN2PtapvtPod eMfhT45SgD/ZvzM8kFWNvEZ6/KIsFPF/LiVJOS7RwDOda+L7dLuy+TAlW1oAWknEc7Pn9a fyPn0Pp6OiiAiDRCcz7mv6KoTdUxwzk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1680177037; x=1711713037; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CrAH2QrN2ySQ5GvkIHo7a3X9ICKho3MAnrqOOPhg1H8=; b=Fh6KoGXFPoEkVJ9iNgIq2QxDN3kCZhZ2H9BlOlBUMoBQOBixpqhzTIrh /+DP6makR2qDg1tL8dXCWjBx00w33S4cc1H6MDvOEmeXY55NTE73MmC9y f6SuU7EchhwIVk1mCEAsk4ND3ZW3PfSVS1ju4KjbMQ6M0Qz7fXs3St82G RCMkpiKO+AnakOizWUc2a8E1Rp/PaovIPYmLwFjqRuynb1mQMdUQO5Qtb X0dKxlEKiKLHrjP/uJxO387XZuB6xlDiyVMBE4UInZho3NhliwhhJfIPz DvK5nayjeqCfnpGpbuL5FBB3+s8utPUZl1GUnqaQwLEQMDKFD4BWFNAuc Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10664"; a="339868456" X-IronPort-AV: E=Sophos;i="5.98,303,1673942400"; d="scan'208";a="339868456" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Mar 2023 04:50:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10664"; a="1014401439" X-IronPort-AV: E=Sophos;i="5.98,303,1673942400"; d="scan'208";a="1014401439" Received: from ngreburx-mobl.ger.corp.intel.com (HELO box.shutemov.name) ([10.251.209.91]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Mar 2023 04:50:12 -0700 Received: by box.shutemov.name (Postfix, from userid 1000) id 3301A104454; Thu, 30 Mar 2023 14:50:00 +0300 (+03) From: "Kirill A. Shutemov" To: Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel Cc: Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Dario Faggioli , Dave Hansen , Mike Rapoport , David Hildenbrand , Mel Gorman , marcelo.cerri@canonical.com, tim.gardner@canonical.com, khalid.elmously@canonical.com, philip.cox@canonical.com, aarcange@redhat.com, peterx@redhat.com, x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv9 07/14] efi/x86: Implement support for unaccepted memory Date: Thu, 30 Mar 2023 14:49:49 +0300 Message-Id: <20230330114956.20342-8-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230330114956.20342-1-kirill.shutemov@linux.intel.com> References: <20230330114956.20342-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 0320A40010 X-Rspam-User: X-Stat-Signature: a4kfi7mebksyckkffmbzyzha5fqrnboc X-HE-Tag: 1680177036-895438 X-HE-Meta: U2FsdGVkX18jJ0O//re/OPgbdqU3FIa0RE/wiC6JBpGHonbo96a7nKYHJ6M5cPST1uWHWlqljEfdSa1Sej0qYquztCkupGaFII5stFombFb5GZuhH47DaVoVrQ1ghd+wnxw834GE3FnMjm6FpG9t/9KsPflSE23TUKvYKqvjpYFPmGtc4yJqZA3lvtw/y+fuMrcHJ/ff9etCYk+vdRuYTwXMcpOGd9ph+eMzZhqoh1y6Q1I9RYHxDyKHGJNr2u1XI862mnRWyJxDWehm160PL7eKS+Tv/0e4GZFRPvpLIeGYfKGyimr6tqmZG4e+1ZuFOgQsBuZdqJfOVMUcH9uxIIERRgUoVopBLHotwCa654A4c7dPN0uc5edCFbZKhAygzD3BG1I2AWBZ4loCblywImmAZ2VORUS7TrxWJKvCt7srzUkN85a9dMEcKvIdqwSg9STcQsJTE7cL4Oh+zYogPa7LFJ3+JaaFwJg8+13oJWP6Na3ETMn7MnnC9UQkEgwknkv+EGVIpa41MKzpvQn1I/38SyYa6ja3SAVUNg6gEhMDUsxU/Wed6GoNrxAw0fg889Iw9fr8Ko4U0D8kAGe8yrby7EqwGPaMImA2jiat5YdFmP9I6B9YPgb94DihKt9AcXbeTKkuwTAWBLxGBKX7q1u8wlef9ep5F6/feyD3sKsLw9nFtdepcBYj9u9s4vip281HWRWi/WwftVKzjTvRMPZIzdqZQ5LwfVEqEpQaQA2WlmFqH0jdg9LFpyc0C8TkxOYWHfJbBDsgHKrEdL/R/TFML9vxYs0iWyvY1p4VDvXbfba7HqFg3LOTQl3pgyjn1z6ZBm2Gur88ruVY0LDkIL+0BXS0epHJmZk1JVK/g1ZwnqQ0gSPhM1JrvHmQLOxlkwc8jj9oRtGKDXy1XxgVSgzA/nD6M0d35m2TNT8fwdwzlmxjJfwAJiWjB1wAg5fOeRcpOVgx34F5OJd9SiS nI0OFFc3 SXrz9nCLMj1Qc1Kf7ncS1Hq2U3eU2nSITrKRicvxne8V8mcTT2tDIcX4MzlJK++LfeZBOMtRXHg4b5m0msWN7x/m9oYYIdzyDuGJkKS9iurS9KlNF/EZbHTNcLcGNo1/QOi0MlLToOUYWmpuwGJxBWJ3eTXmGspWir6rV X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: UEFI Specification version 2.9 introduces the concept of memory acceptance: Some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP, requiring memory to be accepted before it can be used by the guest. Accepting happens via a protocol specific for the Virtual Machine platform. Accepting memory is costly and it makes VMM allocate memory for the accepted guest physical address range. It's better to postpone memory acceptance until memory is needed. It lowers boot time and reduces memory overhead. The kernel needs to know what memory has been accepted. Firmware communicates this information via memory map: a new memory type -- EFI_UNACCEPTED_MEMORY -- indicates such memory. Range-based tracking works fine for firmware, but it gets bulky for the kernel: e820 has to be modified on every page acceptance. It leads to table fragmentation, but there's a limited number of entries in the e820 table Another option is to mark such memory as usable in e820 and track if the range has been accepted in a bitmap. One bit in the bitmap represents 2MiB in the address space: one 4k page is enough to track 64GiB or physical address space. In the worst-case scenario -- a huge hole in the middle of the address space -- It needs 256MiB to handle 4PiB of the address space. Any unaccepted memory that is not aligned to 2M gets accepted upfront. The bitmap is allocated and constructed in the EFI stub and passed down to the kernel via boot_params. allocate_e820() allocates the bitmap if unaccepted memory is present, according to the maximum address in the memory map. Signed-off-by: Kirill A. Shutemov --- Documentation/x86/zero-page.rst | 1 + arch/x86/boot/compressed/Makefile | 1 + arch/x86/boot/compressed/mem.c | 73 ++++++++++++++++++++++++ arch/x86/include/asm/unaccepted_memory.h | 10 ++++ arch/x86/include/uapi/asm/bootparam.h | 2 +- drivers/firmware/efi/Kconfig | 14 +++++ drivers/firmware/efi/efi.c | 1 + drivers/firmware/efi/libstub/x86-stub.c | 65 +++++++++++++++++++++ include/linux/efi.h | 3 +- 9 files changed, 168 insertions(+), 2 deletions(-) create mode 100644 arch/x86/boot/compressed/mem.c create mode 100644 arch/x86/include/asm/unaccepted_memory.h diff --git a/Documentation/x86/zero-page.rst b/Documentation/x86/zero-page.rst index 45aa9cceb4f1..f21905e61ade 100644 --- a/Documentation/x86/zero-page.rst +++ b/Documentation/x86/zero-page.rst @@ -20,6 +20,7 @@ Offset/Size Proto Name Meaning 060/010 ALL ist_info Intel SpeedStep (IST) BIOS support information (struct ist_info) 070/008 ALL acpi_rsdp_addr Physical address of ACPI RSDP table +078/008 ALL unaccepted_memory Bitmap of unaccepted memory (1bit == 2M) 080/010 ALL hd0_info hd0 disk parameter, OBSOLETE!! 090/010 ALL hd1_info hd1 disk parameter, OBSOLETE!! 0A0/010 ALL sys_desc_table System description table (struct sys_desc_table), diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile index 6b6cfe607bdb..f62c02348f9a 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -107,6 +107,7 @@ endif vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o $(obj)/tdcall.o +vmlinux-objs-$(CONFIG_UNACCEPTED_MEMORY) += $(obj)/bitmap.o $(obj)/mem.o vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_mixed.o diff --git a/arch/x86/boot/compressed/mem.c b/arch/x86/boot/compressed/mem.c new file mode 100644 index 000000000000..6b15a0ed8b54 --- /dev/null +++ b/arch/x86/boot/compressed/mem.c @@ -0,0 +1,73 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "../cpuflags.h" +#include "bitmap.h" +#include "error.h" +#include "math.h" + +#define PMD_SHIFT 21 +#define PMD_SIZE (_AC(1, UL) << PMD_SHIFT) +#define PMD_MASK (~(PMD_SIZE - 1)) + +static inline void __accept_memory(phys_addr_t start, phys_addr_t end) +{ + /* Platform-specific memory-acceptance call goes here */ + error("Cannot accept memory"); +} + +/* + * The accepted memory bitmap only works at PMD_SIZE granularity. Take + * unaligned start/end addresses and either: + * 1. Accepts the memory immediately and in its entirety + * 2. Accepts unaligned parts, and marks *some* aligned part unaccepted + * + * The function will never reach the bitmap_set() with zero bits to set. + */ +void process_unaccepted_memory(struct boot_params *params, u64 start, u64 end) +{ + /* + * Ensure that at least one bit will be set in the bitmap by + * immediately accepting all regions under 2*PMD_SIZE. This is + * imprecise and may immediately accept some areas that could + * have been represented in the bitmap. But, results in simpler + * code below + * + * Consider case like this: + * + * | 4k | 2044k | 2048k | + * ^ 0x0 ^ 2MB ^ 4MB + * + * Only the first 4k has been accepted. The 0MB->2MB region can not be + * represented in the bitmap. The 2MB->4MB region can be represented in + * the bitmap. But, the 0MB->4MB region is <2*PMD_SIZE and will be + * immediately accepted in its entirety. + */ + if (end - start < 2 * PMD_SIZE) { + __accept_memory(start, end); + return; + } + + /* + * No matter how the start and end are aligned, at least one unaccepted + * PMD_SIZE area will remain to be marked in the bitmap. + */ + + /* Immediately accept a unaccepted_memory, + start / PMD_SIZE, (end - start) / PMD_SIZE); +} diff --git a/arch/x86/include/asm/unaccepted_memory.h b/arch/x86/include/asm/unaccepted_memory.h new file mode 100644 index 000000000000..df0736d32858 --- /dev/null +++ b/arch/x86/include/asm/unaccepted_memory.h @@ -0,0 +1,10 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright (C) 2020 Intel Corporation */ +#ifndef _ASM_X86_UNACCEPTED_MEMORY_H +#define _ASM_X86_UNACCEPTED_MEMORY_H + +struct boot_params; + +void process_unaccepted_memory(struct boot_params *params, u64 start, u64 num); + +#endif diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h index 01d19fc22346..630a54046af0 100644 --- a/arch/x86/include/uapi/asm/bootparam.h +++ b/arch/x86/include/uapi/asm/bootparam.h @@ -189,7 +189,7 @@ struct boot_params { __u64 tboot_addr; /* 0x058 */ struct ist_info ist_info; /* 0x060 */ __u64 acpi_rsdp_addr; /* 0x070 */ - __u8 _pad3[8]; /* 0x078 */ + __u64 unaccepted_memory; /* 0x078 */ __u8 hd0_info[16]; /* obsolete! */ /* 0x080 */ __u8 hd1_info[16]; /* obsolete! */ /* 0x090 */ struct sys_desc_table sys_desc_table; /* obsolete! */ /* 0x0a0 */ diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig index 043ca31c114e..231f1c70d1db 100644 --- a/drivers/firmware/efi/Kconfig +++ b/drivers/firmware/efi/Kconfig @@ -269,6 +269,20 @@ config EFI_COCO_SECRET virt/coco/efi_secret module to access the secrets, which in turn allows userspace programs to access the injected secrets. +config UNACCEPTED_MEMORY + bool + depends on EFI_STUB + help + Some Virtual Machine platforms, such as Intel TDX, require + some memory to be "accepted" by the guest before it can be used. + This mechanism helps prevent malicious hosts from making changes + to guest memory. + + UEFI specification v2.9 introduced EFI_UNACCEPTED_MEMORY memory type. + + This option adds support for unaccepted memory and makes such memory + usable by the kernel. + config EFI_EMBEDDED_FIRMWARE bool select CRYPTO_LIB_SHA256 diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index abeff7dc0b58..7dce06e419c5 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -843,6 +843,7 @@ static __initdata char memory_type_name[][13] = { "MMIO Port", "PAL Code", "Persistent", + "Unaccepted", }; char * __init efi_md_typeattr_format(char *buf, size_t size, diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c index fff81843169c..1643ddbde249 100644 --- a/drivers/firmware/efi/libstub/x86-stub.c +++ b/drivers/firmware/efi/libstub/x86-stub.c @@ -15,6 +15,7 @@ #include #include #include +#include #include "efistub.h" @@ -613,6 +614,16 @@ setup_e820(struct boot_params *params, struct setup_data *e820ext, u32 e820ext_s e820_type = E820_TYPE_PMEM; break; + case EFI_UNACCEPTED_MEMORY: + if (!IS_ENABLED(CONFIG_UNACCEPTED_MEMORY)) { + efi_warn_once( +"The system has unaccepted memory, but kernel does not support it\nConsider enabling CONFIG_UNACCEPTED_MEMORY\n"); + continue; + } + e820_type = E820_TYPE_RAM; + process_unaccepted_memory(params, d->phys_addr, + d->phys_addr + PAGE_SIZE * d->num_pages); + break; default: continue; } @@ -677,6 +688,57 @@ static efi_status_t alloc_e820ext(u32 nr_desc, struct setup_data **e820ext, return status; } +static efi_status_t allocate_unaccepted_bitmap(struct boot_params *params, + __u32 nr_desc, + struct efi_boot_memmap *map) +{ + unsigned long *mem = NULL; + u64 size, max_addr = 0; + efi_status_t status; + bool found = false; + int i; + + /* Check if there's any unaccepted memory and find the max address */ + for (i = 0; i < nr_desc; i++) { + efi_memory_desc_t *d; + unsigned long m = (unsigned long)map->map; + + d = efi_early_memdesc_ptr(m, map->desc_size, i); + if (d->type == EFI_UNACCEPTED_MEMORY) + found = true; + if (d->phys_addr + d->num_pages * PAGE_SIZE > max_addr) + max_addr = d->phys_addr + d->num_pages * PAGE_SIZE; + } + + if (!found) { + params->unaccepted_memory = 0; + return EFI_SUCCESS; + } + + /* + * If unaccepted memory is present, allocate a bitmap to track what + * memory has to be accepted before access. + * + * One bit in the bitmap represents 2MiB in the address space: + * A 4k bitmap can track 64GiB of physical address space. + * + * In the worst case scenario -- a huge hole in the middle of the + * address space -- It needs 256MiB to handle 4PiB of the address + * space. + * + * The bitmap will be populated in setup_e820() according to the memory + * map after efi_exit_boot_services(). + */ + size = DIV_ROUND_UP(max_addr, PMD_SIZE * BITS_PER_BYTE); + status = efi_allocate_pages(size, (unsigned long *)&mem, ULONG_MAX); + if (status == EFI_SUCCESS) { + memset(mem, 0, size); + params->unaccepted_memory = (unsigned long)mem; + } + + return status; +} + static efi_status_t allocate_e820(struct boot_params *params, struct setup_data **e820ext, u32 *e820ext_size) @@ -697,6 +759,9 @@ static efi_status_t allocate_e820(struct boot_params *params, status = alloc_e820ext(nr_e820ext, e820ext, e820ext_size); } + if (IS_ENABLED(CONFIG_UNACCEPTED_MEMORY) && status == EFI_SUCCESS) + status = allocate_unaccepted_bitmap(params, nr_desc, map); + efi_bs_call(free_pool, map); return status; } diff --git a/include/linux/efi.h b/include/linux/efi.h index 04a733f0ba95..1d4f0343c710 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -108,7 +108,8 @@ typedef struct { #define EFI_MEMORY_MAPPED_IO_PORT_SPACE 12 #define EFI_PAL_CODE 13 #define EFI_PERSISTENT_MEMORY 14 -#define EFI_MAX_MEMORY_TYPE 15 +#define EFI_UNACCEPTED_MEMORY 15 +#define EFI_MAX_MEMORY_TYPE 16 /* Attribute values: */ #define EFI_MEMORY_UC ((u64)0x0000000000000001ULL) /* uncached */