From patchwork Fri Dec 9 06:52:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 13069302 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD002C4332F for ; Fri, 9 Dec 2022 06:53:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 73154940009; Fri, 9 Dec 2022 01:53:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6E161940007; Fri, 9 Dec 2022 01:53:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 58225940009; Fri, 9 Dec 2022 01:53:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 4A44C940007 for ; Fri, 9 Dec 2022 01:53:44 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 1C4B380291 for ; Fri, 9 Dec 2022 06:53:44 +0000 (UTC) X-FDA: 80221852368.13.5233128 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf20.hostedemail.com (Postfix) with ESMTP id 1CBAA1C0018 for ; Fri, 9 Dec 2022 06:53:41 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="T2/zznXj"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf20.hostedemail.com: domain of kai.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=kai.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670568822; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fN/MJiMG+EnHfa3iyDzFYdbA9H6GoAts1p/V8VgDSqM=; b=AIgW8PSJcG7I//qwu80hCmAiIDWmaxPFdlCeRMXfwL3G5qO4hJ2qoohQ/eZ0re3k1Yq5BW ayoQl3VYnKXPemrhIGm5B8bLzwjh6SMwU1mda1uoagg5z5uSUen5pZagYtlGez9Ylne3Sj lGX4CHDhBqTij7hZYpbw2mu4NmXPsOo= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="T2/zznXj"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf20.hostedemail.com: domain of kai.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=kai.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670568822; a=rsa-sha256; cv=none; b=i47dS5BVbG7PG8WqvcktMeBiLl2YJlcdp0+vy2asKyUXXZxic49ZTO4Ay8MNU4hLa0kSJR LyBVtcR6pu8op5SXIg7Pu/ujEVY/Dabh/AIZtVe6lTBkGCzg6/B5/HrQXeVVEKoJv8tCaI c8fCUMHMt9k+jc8FLYbHmzRPXkUlOIE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670568822; x=1702104822; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=D+jCp1tbjtRLXfVHa9PahUAAUr8LxwkuzJ2V9nLXOCM=; b=T2/zznXj5/jhRi0NEhKGVLwn2F9MKsoLsJj0/OEI+YDv/ZPXcesK44hV o+8BIuecnwtelGH75pL/hI0WQrWyZ5OYsVnfhkwEbubWrSRVNBAMATcSn S2uXQpLprP20+ZWwEWfKDEDyoDIILJMBJtdEZpUPbXvD68YSYaTTlh482 JQjI/vKVtlp2EmFh23q1sZvwMd1FwvIsvZ4dQGXhNYgaWFErXosQIFE47 F2Tfo1GbOX0aJMlte9czdNdRL0+2sz9lu2RAjWKVM59eKgsWXXAXVtJlt bKjIgQMaFQj+U4/KHTuIR1YlzzYDvRg5dXSr2D0HBChVH3Pop+Vyrv7lV A==; X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="318551370" X-IronPort-AV: E=Sophos;i="5.96,230,1665471600"; d="scan'208";a="318551370" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 22:53:41 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="679836973" X-IronPort-AV: E=Sophos;i="5.96,230,1665471600"; d="scan'208";a="679836973" Received: from omiramon-mobl1.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.212.28.82]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 22:53:36 -0800 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: linux-mm@kvack.org, dave.hansen@intel.com, peterz@infradead.org, tglx@linutronix.de, seanjc@google.com, pbonzini@redhat.com, dan.j.williams@intel.com, rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com, ying.huang@intel.com, reinette.chatre@intel.com, len.brown@intel.com, tony.luck@intel.com, ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com, sagis@google.com, imammedo@redhat.com, kai.huang@intel.com Subject: [PATCH v8 08/16] x86/virt/tdx: Add placeholder to construct TDMRs to cover all TDX memory regions Date: Fri, 9 Dec 2022 19:52:29 +1300 Message-Id: X-Mailer: git-send-email 2.38.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Queue-Id: 1CBAA1C0018 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: 85pi5m1zm8w5ga3ncpupscbekc1axquc X-HE-Tag: 1670568821-341817 X-HE-Meta: U2FsdGVkX1+Kk/QWvj4WDav62BQ4pq17Pcqv5S9Nb2BaPAnLh1uJ/qUU00D5o7leRXCI6Zx+8BCOVRzmAaYMwApIpQMuoN2DVey/TOpjOxg/Gng5S7s072rjb1Ew4rGtUfwisRlf5eK/ZBUAXPesLm0i/GVk1FElijbLJpifR4o2IlLjNWT01TPD9oLHnxcyj/q2frbjWVMMKBqGz43qo+MyxdL6aclw38QKUQtRAdXOEVizeKmbbo/yZhagZ/N6WW/5ph9TKDydx9amTANBHhXQZ+WXqE8kB3ROa/X2jcy0Ve2vZbZGM1u5TfzeG1cx+Z584RagUIGzUSj9P9OYNusyrlX6qyAtf38UyzVWoZmXTe0tHoygMUz8YefzliqyE5gtFaV9RQFcOKg3/cIrqVaapTy/xc7N9OMeghP4GCSEmYOGiUOPqLYABWPgzSPmEDUcNjQYX9ia9+tF+chWyLE7MEVi8NZy0XIsOEUCAs1zzC/XcwGThEDW2iacOVJtlEyvEatgGerL4mbw5y458fum+T6lGPCP3zvCf8b9IjS1wHYIoNFlpNMAzm2f5FRfkyCCXHQdcmKwNKuZFdBcxMUhzmfDfsFwn5bzRTmOrqVnrgGoD8si7BNqd8LA3UDqL/5XJvdteH0md/pnJ3/xzg4QgHTtIydPfpECA6Vv8zZmc0cIwLbF/cCfLppOWuGhkNjWAFjGnWpOyPDh1UhvDh7vBkQkJYe4iKvdsZ6GNiyS8o9S05PAonmEpG2y+9aABvsbeBR2Z65f+a1TpB9ZKi3uJHMSA60EgS+lusiBpTdW1goM1l4Z9t8UjMQtxCn9WjKqoA6pdic3Ohme/LX32aLqPc0n6gXD1A+YUS97gqWvP6dMMojDCpiZ106hsGM6lV+548yDCZnk9mPH9YXw5g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: After the kernel selects all TDX-usable memory regions, the kernel needs to pass those regions to the TDX module via data structure "TD Memory Region" (TDMR). Add a placeholder to construct a list of TDMRs (in multiple steps) to cover all TDX-usable memory regions. === Long Version === TDX provides increased levels of memory confidentiality and integrity. This requires special hardware support for features like memory encryption and storage of memory integrity checksums. Not all memory satisfies these requirements. As a result, TDX introduced the concept of a "Convertible Memory Region" (CMR). During boot, the firmware builds a list of all of the memory ranges which can provide the TDX security guarantees. The list of these ranges is available to the kernel by querying the TDX module. The TDX architecture needs additional metadata to record things like which TD guest "owns" a given page of memory. This metadata essentially serves as the 'struct page' for the TDX module. The space for this metadata is not reserved by the hardware up front and must be allocated by the kernel and given to the TDX module. Since this metadata consumes space, the VMM can choose whether or not to allocate it for a given area of convertible memory. If it chooses not to, the memory cannot receive TDX protections and can not be used by TDX guests as private memory. For every memory region that the VMM wants to use as TDX memory, it sets up a "TD Memory Region" (TDMR). Each TDMR represents a physically contiguous convertible range and must also have its own physically contiguous metadata table, referred to as a Physical Address Metadata Table (PAMT), to track status for each page in the TDMR range. Unlike a CMR, each TDMR requires 1G granularity and alignment. To support physical RAM areas that don't meet those strict requirements, each TDMR permits a number of internal "reserved areas" which can be placed over memory holes. If PAMT metadata is placed within a TDMR it must be covered by one of these reserved areas. Let's summarize the concepts: CMR - Firmware-enumerated physical ranges that support TDX. CMRs are 4K aligned. TDMR - Physical address range which is chosen by the kernel to support TDX. 1G granularity and alignment required. Each TDMR has reserved areas where TDX memory holes and overlapping PAMTs can be represented. PAMT - Physically contiguous TDX metadata. One table for each page size per TDMR. Roughly 1/256th of TDMR in size. 256G TDMR = ~1G PAMT. As one step of initializing the TDX module, the kernel configures TDX-usable memory regions by passing a list of TDMRs to the TDX module. Constructing the list of TDMRs consists below steps: 1) Fill out TDMRs to cover all memory regions that the TDX module will use for TD memory. 2) Allocate and set up PAMT for each TDMR. 3) Designate reserved areas for each TDMR. Add a placeholder to construct TDMRs to do the above steps. Always free the space for TDMRs at the end of the module initialization (no matter successful or not) as TDMRs are only used during the initialization. Reviewed-by: Isaku Yamahata Signed-off-by: Kai Huang --- v7 -> v8: - Improved changelog to tell this is one step of "TODO list" in init_tdx_module(). - Other changelog improvement suggested by Dave (with "Create TDMRs" to "Fill out TDMRs" to align with the code). - Added a "TODO list" comment to lay out the steps to construct TDMRs, following the same idea of "TODO list" in tdx_module_init(). - Introduced 'struct tdmr_info_list' (Dave) - Further added additional members (tdmr_sz/max_tdmrs/nr_tdmrs) to simplify getting TDMR by given index, and reduce passing arguments around functions. - Added alloc_tdmr_list()/free_tdmr_list() accordingly, which internally uses tdmr_size_single() (Dave). - tdmr_num -> nr_tdmrs (Dave). v6 -> v7: - Improved commit message to explain 'int' overflow cannot happen in cal_tdmr_size() and alloc_tdmr_array(). -- Andy/Dave. v5 -> v6: - construct_tdmrs_memblock() -> construct_tdmrs() as 'tdx_memblock' is used instead of memblock. - Added Isaku's Reviewed-by. - v3 -> v5 (no feedback on v4): - Moved calculating TDMR size to this patch. - Changed to use alloc_pages_exact() to allocate buffer for all TDMRs once, instead of allocating each TDMR individually. - Removed "crypto protection" in the changelog. - -EFAULT -> -EINVAL in couple of places. --- arch/x86/virt/vmx/tdx/tdx.c | 104 +++++++++++++++++++++++++++++++++++- arch/x86/virt/vmx/tdx/tdx.h | 23 ++++++++ 2 files changed, 125 insertions(+), 2 deletions(-) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index f010402f443d..d36ac72ef299 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -347,6 +348,86 @@ static int build_tdx_memlist(struct list_head *tmb_list) return ret; } +struct tdmr_info_list { + struct tdmr_info *first_tdmr; + int tdmr_sz; + int max_tdmrs; + int nr_tdmrs; /* Actual number of TDMRs */ +}; + +/* Calculate the actual TDMR size */ +static int tdmr_size_single(u16 max_reserved_per_tdmr) +{ + int tdmr_sz; + + /* + * The actual size of TDMR depends on the maximum + * number of reserved areas. + */ + tdmr_sz = sizeof(struct tdmr_info); + tdmr_sz += sizeof(struct tdmr_reserved_area) * max_reserved_per_tdmr; + + return ALIGN(tdmr_sz, TDMR_INFO_ALIGNMENT); +} + +static int alloc_tdmr_list(struct tdmr_info_list *tdmr_list, + struct tdsysinfo_struct *sysinfo) +{ + size_t tdmr_sz, tdmr_array_sz; + void *tdmr_array; + + tdmr_sz = tdmr_size_single(sysinfo->max_reserved_per_tdmr); + tdmr_array_sz = tdmr_sz * sysinfo->max_tdmrs; + + /* + * To keep things simple, allocate all TDMRs together. + * The buffer needs to be physically contiguous to make + * sure each TDMR is physically contiguous. + */ + tdmr_array = alloc_pages_exact(tdmr_array_sz, + GFP_KERNEL | __GFP_ZERO); + if (!tdmr_array) + return -ENOMEM; + + tdmr_list->first_tdmr = tdmr_array; + /* + * Keep the size of TDMR to find the target TDMR + * at a given index in the TDMR list. + */ + tdmr_list->tdmr_sz = tdmr_sz; + tdmr_list->max_tdmrs = sysinfo->max_tdmrs; + tdmr_list->nr_tdmrs = 0; + + return 0; +} + +static void free_tdmr_list(struct tdmr_info_list *tdmr_list) +{ + free_pages_exact(tdmr_list->first_tdmr, + tdmr_list->max_tdmrs * tdmr_list->tdmr_sz); +} + +/* + * Construct a list of TDMRs on the preallocated space in @tdmr_list + * to cover all TDX memory regions in @tmb_list based on the TDX module + * information in @sysinfo. + */ +static int construct_tdmrs(struct list_head *tmb_list, + struct tdmr_info_list *tdmr_list, + struct tdsysinfo_struct *sysinfo) +{ + /* + * TODO: + * + * - Fill out TDMRs to cover all TDX memory regions. + * - Allocate and set up PAMTs for each TDMR. + * - Designate reserved areas for each TDMR. + * + * Return -EINVAL until constructing TDMRs is done + */ + return -EINVAL; +} + static int init_tdx_module(void) { /* @@ -358,6 +439,7 @@ static int init_tdx_module(void) TDSYSINFO_STRUCT_SIZE, TDSYSINFO_STRUCT_ALIGNMENT); struct cmr_info cmr_array[MAX_CMRS] __aligned(CMR_INFO_ARRAY_ALIGNMENT); struct tdsysinfo_struct *sysinfo = &PADDED_STRUCT(tdsysinfo); + struct tdmr_info_list tdmr_list; int ret; ret = tdx_get_sysinfo(sysinfo, cmr_array); @@ -380,11 +462,19 @@ static int init_tdx_module(void) if (ret) goto out; + /* Allocate enough space for constructing TDMRs */ + ret = alloc_tdmr_list(&tdmr_list, sysinfo); + if (ret) + goto out_free_tdx_mem; + + /* Cover all TDX-usable memory regions in TDMRs */ + ret = construct_tdmrs(&tdx_memlist, &tdmr_list, sysinfo); + if (ret) + goto out_free_tdmrs; + /* * TODO: * - * - Construct a list of TDMRs to cover all TDX-usable memory - * regions. * - Pick up one TDX private KeyID as the global KeyID. * - Configure the TDMRs and the global KeyID to the TDX module. * - Configure the global KeyID on all packages. @@ -393,6 +483,16 @@ static int init_tdx_module(void) * Return error before all steps are done. */ ret = -EINVAL; +out_free_tdmrs: + /* + * Free the space for the TDMRs no matter the initialization is + * successful or not. They are not needed anymore after the + * module initialization. + */ + free_tdmr_list(&tdmr_list); +out_free_tdx_mem: + if (ret) + free_tdx_memlist(&tdx_memlist); out: /* * @tdx_memlist is written here and read at memory hotplug time. diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 6d32f62e4182..d0c762f1a94c 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -90,6 +90,29 @@ struct tdsysinfo_struct { DECLARE_FLEX_ARRAY(struct cpuid_config, cpuid_configs); } __packed; +struct tdmr_reserved_area { + u64 offset; + u64 size; +} __packed; + +#define TDMR_INFO_ALIGNMENT 512 + +struct tdmr_info { + u64 base; + u64 size; + u64 pamt_1g_base; + u64 pamt_1g_size; + u64 pamt_2m_base; + u64 pamt_2m_size; + u64 pamt_4k_base; + u64 pamt_4k_size; + /* + * Actual number of reserved areas depends on + * 'struct tdsysinfo_struct'::max_reserved_per_tdmr. + */ + DECLARE_FLEX_ARRAY(struct tdmr_reserved_area, reserved_areas); +} __packed __aligned(TDMR_INFO_ALIGNMENT); + /* * Do not put any hardware-defined TDX structure representations below * this comment!