From patchwork Wed Jun 22 11:16:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 12890529 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C495C43334 for ; Wed, 22 Jun 2022 11:17:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 298DB6B012C; Wed, 22 Jun 2022 07:17:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 248C98E00AB; Wed, 22 Jun 2022 07:17:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1122B6B012F; Wed, 22 Jun 2022 07:17:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 02C516B012C for ; Wed, 22 Jun 2022 07:17:00 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id D3A5580317 for ; Wed, 22 Jun 2022 11:16:59 +0000 (UTC) X-FDA: 79605619758.21.4BDA4F8 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf10.hostedemail.com (Postfix) with ESMTP id 1CA55C001F for ; Wed, 22 Jun 2022 11:16:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655896618; x=1687432618; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=B8lOyV+NN2DHameRBeRw32eUf0Rp3UclgjoKHzIwykM=; b=M9CJszNf4aE5WqLw00kvgu8iEn0u4iNuw907giV9ksMgKCnQh4l000K/ iLBujULU9fi7wgw1BQnc4jtmyC7RVQJt/BAiaVFL1K2kt9c+B1vqaKpWK /wGnuU7u+Qfs0ug3jBb6o0Q0pJriiwHxRlksLSlV9v3ReUrc1SHqQmr9c DwmQIANtvOt30AYMHc1FNzOaRuxVnaWTl80f8pTnUQc5TrH/81WappVDX BfQJ7iCQIAy0ClhUnhR5b+noSJTdATtEQFTq4zIX/SqPdA1eS7Tyg2tWY Y9vU02p6TQRIO1v7hLns0RBsnC9HwDOtvQP3bhkq6+ylysD/fbSRX8LXU g==; X-IronPort-AV: E=McAfee;i="6400,9594,10385"; a="263423673" X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="263423673" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:16:32 -0700 X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="585679747" Received: from jmatsis-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.178.197]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:16:28 -0700 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com, akpm@linux-foundation.org, kai.huang@intel.com Subject: [PATCH v5 05/22] x86/virt/tdx: Prevent hot-add driver managed memory Date: Wed, 22 Jun 2022 23:16:19 +1200 Message-Id: <173e1f9b2348f29e5f7d939855b8dd98625bcb35.1655894131.git.kai.huang@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=M9CJszNf; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf10.hostedemail.com: domain of kai.huang@intel.com has no SPF policy when checking 134.134.136.126) smtp.mailfrom=kai.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655896618; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=45UZn7CXg2uP/UqvMEnj9VUitp30m5WNv8XN4fT41E0=; b=FqPrMhf/x9R3Kj5IarjDUrMjZKiyWiY2X9tyTeIKizG/IUfMO3lApyS88vh5ku+utedE+c gIVgxR9mUiVZ9aprFEVbpU9YuU6C9zQXT8aue0Jciiff1UVDG6oimI0LAZWDwYTkqoMK7e XPcjTE0mKqvhR8mUBZHUPPGlwqn01ak= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655896618; a=rsa-sha256; cv=none; b=kGD/zJx9gRmfs9GX/Yjf/vsdCurNZHE2n8/JeDtR0uOfcLVQSEOen7fRQDALPqRl5xp9xE HMSwWGBnxMEyLduTlWwkXGVe/A2+SRTSGvQm1xAQQDQHli1AoK7domehWGkEJTt17Md2aN OXiGrXlJi0qx9Rz3f3/Wu//Ec6IVsoo= X-Rspam-User: Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=M9CJszNf; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf10.hostedemail.com: domain of kai.huang@intel.com has no SPF policy when checking 134.134.136.126) smtp.mailfrom=kai.huang@intel.com X-Stat-Signature: hxt7rkrc9pksp133x358ezahj96dwu9n X-Rspamd-Queue-Id: 1CA55C001F X-Rspamd-Server: rspam12 X-HE-Tag: 1655896617-32607 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: TDX provides increased levels of memory confidentiality and integrity. This requires special hardware support for features like memory encryption and storage of memory integrity checksums. Not all memory satisfies these requirements. As a result, the TDX introduced the concept of a "Convertible Memory Region" (CMR). During boot, the firmware builds a list of all of the memory ranges which can provide the TDX security guarantees. The list of these ranges is available to the kernel by querying the TDX module. However those TDX-capable memory regions are not automatically usable to the TDX module. The kernel needs to choose which convertible memory regions to be the TDX-usable memory and pass those regions to the TDX module when initializing the module. Once those ranges are passed to the TDX module, the TDX-usable memory regions are fixed during module's lifetime. To avoid having to modify the page allocator to distinguish TDX and non-TDX memory allocation, this implementation guarantees all pages managed by the page allocator are TDX memory. This means any hot-added memory to the page allocator will break such guarantee thus should be prevented. There are basically two memory hot-add cases that need to be prevented: ACPI memory hot-add and driver managed memory hot-add. However, adding new memory to ZONE_DEVICE should not be prevented as those pages are not managed by the page allocator. Therefore memremap_pages() variants should be allowed although they internally also use memory hotplug functions. ACPI memory hotplug is already prevented. To prevent driver managed memory and still allow memremap_pages() variants to work, add a __weak hook to do arch-specific check in add_memory_resource(). Implement the x86 version to prevent new memory region from being added when TDX is enabled by BIOS. The __weak arch-specific hook is used instead of a new CC_ATTR similar to disable software CPU hotplug. It is because some driver managed memory resources may actually be TDX-capable (such as legacy PMEM, which is underneath indeed RAM), and the arch-specific hook can be further enhanced to allow those when needed. Note arch-specific hook for __remove_memory() is not required. Both ACPI hot-removal and driver managed memory removal cannot reach it. Signed-off-by: Kai Huang --- arch/x86/mm/init_64.c | 21 +++++++++++++++++++++ include/linux/memory_hotplug.h | 2 ++ mm/memory_hotplug.c | 15 +++++++++++++++ 3 files changed, 38 insertions(+) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 96d34ebb20a9..ce89cf88a818 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -55,6 +55,7 @@ #include #include #include +#include #include "mm_internal.h" @@ -972,6 +973,26 @@ int arch_add_memory(int nid, u64 start, u64 size, return add_pages(nid, start_pfn, nr_pages, params); } +int arch_memory_add_precheck(int nid, u64 start, u64 size, mhp_t mhp_flags) +{ + if (!platform_tdx_enabled()) + return 0; + + /* + * TDX needs to guarantee all pages managed by the page allocator + * are TDX memory in order to not have to distinguish TDX and + * non-TDX memory allocation. The kernel needs to pass the + * TDX-usable memory regions to the TDX module when it gets + * initialized. After that, the TDX-usable memory regions are + * fixed. This means any memory hot-add to the page allocator + * will break above guarantee thus should be prevented. + */ + pr_err("Unable to add memory [0x%llx, 0x%llx) on TDX enabled platform.\n", + start, start + size); + + return -EINVAL; +} + static void __meminit free_pagetable(struct page *page, int order) { unsigned long magic; diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 1ce6f8044f1e..306ef4ceb419 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -325,6 +325,8 @@ extern int add_memory_resource(int nid, struct resource *resource, extern int add_memory_driver_managed(int nid, u64 start, u64 size, const char *resource_name, mhp_t mhp_flags); +extern int arch_memory_add_precheck(int nid, u64 start, u64 size, + mhp_t mhp_flags); extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages, struct vmem_altmap *altmap, int migratetype); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 416b38ca8def..2ad4b2603c7c 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1296,6 +1296,17 @@ bool mhp_supports_memmap_on_memory(unsigned long size) IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)); } +/* + * Pre-check whether hot-add memory is allowed before arch_add_memory(). + * + * Arch to provide replacement version if required. + */ +int __weak arch_memory_add_precheck(int nid, u64 start, u64 size, + mhp_t mhp_flags) +{ + return 0; +} + /* * NOTE: The caller must call lock_device_hotplug() to serialize hotplug * and online/offline operations (triggered e.g. by sysfs). @@ -1319,6 +1330,10 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags) if (ret) return ret; + ret = arch_memory_add_precheck(nid, start, size, mhp_flags); + if (ret) + return ret; + if (mhp_flags & MHP_NID_IS_MGID) { group = memory_group_find_by_id(nid); if (!group)