From patchwork Mon Feb 13 11:59:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 13138313 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B261C636CC for ; Mon, 13 Feb 2023 12:01:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 287976B0085; Mon, 13 Feb 2023 07:01:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 173096B0087; Mon, 13 Feb 2023 07:01:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E8F616B0088; Mon, 13 Feb 2023 07:01:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id D428E6B0085 for ; Mon, 13 Feb 2023 07:01:45 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id B34DB1416D6 for ; Mon, 13 Feb 2023 12:01:45 +0000 (UTC) X-FDA: 80462129370.02.1FAEEFF Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf18.hostedemail.com (Postfix) with ESMTP id 870DD1C001F for ; Mon, 13 Feb 2023 12:01:42 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=NhB5lGB+; spf=pass (imf18.hostedemail.com: domain of kai.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=kai.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676289703; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rKc9/iJhCIadU6Tu39L8nkDDWYBGjK38hhBdShNscZ0=; b=ThvHunjRCtf6DAWfIGSvQfSW7mDTQx5LimgQQsShQkb7CWODRVFoPhBySTxFJeaw91BKPm 6Y/TVzyjCFNCHeLl8D6rpompRbwcxcFY2c23UD/lF+mt02j87VIucSSlvZQLpGB8K14Aht A9Uv1S25+lhXTYCbqh6Qe1vGKRablv8= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=NhB5lGB+; spf=pass (imf18.hostedemail.com: domain of kai.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=kai.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676289704; a=rsa-sha256; cv=none; b=ODdnuFur3WiuYfy5b1CiWMOdE8PO8Yu2XwnkwbZYUlEvNceeDX3dGA45HZOXP1ha2SP+bK /bozvhSWsqHQXbhLjixoAYmzgUbccgActfp5C7PSzSEaIbBFq3yELvX14FMqWbAql2srBa Owm0wYlMyyuMsDeg2MfBFASm4BhK2wI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1676289703; x=1707825703; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=aXDcPR17uuWmglTwVX1NU6yhdcGMUuz+E4pRKPyDy0k=; b=NhB5lGB+xbpPcFxTkNysEvNgDhj4tTqDvf4vxgpqGwbhreFcnCeYWzMd ecErYFWetHIqbwbA9tXqL2jFhtmxb9zvN0KjWlTv56vBCT2TjCSkalkuv k/qGB+10mqRSB6YO1sIPOvtdyfdeMoA0moGJ/mLxzOX0X2DyAmbKov220 NbHfWqV8sJSHmqgcXwKjyVg8zLKYHe1Uz57npnGX9RA9A3DPvys0RrndX haBCZyyrMaXZ/wbyae2NyH79PX27tOgAGJAYqPJzpbDoLVFs/H5MvqO4v D71UdMsNgvHL9A821riygjkuC0IPE/D8HZHyKh/tWGdmOU5ro+V2BfG+G g==; X-IronPort-AV: E=McAfee;i="6500,9779,10619"; a="358283527" X-IronPort-AV: E=Sophos;i="5.97,293,1669104000"; d="scan'208";a="358283527" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Feb 2023 04:01:42 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10619"; a="701243552" X-IronPort-AV: E=Sophos;i="5.97,293,1669104000"; d="scan'208";a="701243552" Received: from wonger-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.188.34]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Feb 2023 04:01:37 -0800 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: linux-mm@kvack.org, dave.hansen@intel.com, peterz@infradead.org, tglx@linutronix.de, seanjc@google.com, pbonzini@redhat.com, dan.j.williams@intel.com, rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com, ying.huang@intel.com, reinette.chatre@intel.com, len.brown@intel.com, tony.luck@intel.com, ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, david@redhat.com, bagasdotme@gmail.com, sagis@google.com, imammedo@redhat.com, kai.huang@intel.com Subject: [PATCH v9 17/18] x86/virt/tdx: Flush cache in kexec() when TDX is enabled Date: Tue, 14 Feb 2023 00:59:24 +1300 Message-Id: X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 870DD1C001F X-Stat-Signature: 45pqy8capyoo8fxc6d6gh3usxwfhhcsb X-Rspam-User: X-HE-Tag: 1676289702-660120 X-HE-Meta: U2FsdGVkX194y97+QPTWYdCX1573w4mnsEAbcaFh/Vk/U4CZv9ilS3zOq9NNz0Hs1s/hM+Aai5MTVGplvkLC19rUtRZI42qqecVoKLb5i281tId9Q/wUilQO8KKPuuPB4hG87m4xBsPLCuFBeOZ4VGI/x+QKBMCAHjQGJusZY0kV3XM2T0CcZvxwNAdOzKyvkbsoDe3LZzlxHvDZHeiRvQrXsENfWj0zVOgxLZo0FJahQ2l/3qaFXAzzBWAdEGmHZMDaZHz5s8gCojN+ULUvSV5FC/3h+bqHCx2CPL8nS+O4J3hRnV6fBs4ZOflRtJ3ngo87LzSNHKDhD6lfYV41p+fO+9k2CL854LJu2R2W7tyacCga6OWUiNWHcSlmvTiqYBSoAQgLg7HoCUu76pkRKvz6xG2fkdauYICjqlFXCSql8BBOofRJK4bRraLFSys9JUkL59g2/u3Mtgmu7zWkuv7aIMW2lW7twRC9PfkK0Ib4J015Qbx1CzCkzxVNXnZ9b/RYSQnqHx1GPTABxSeS7NMm70b0v3rByYL2lWFo/FKaJ7tmx+j2Mz63PN1HmpYmLx+a2utcyIw9kUrrqM6p4anQYytfyQAL3IX3CapyKYh3Wrwxs+B7ZlFfb6XMnVlq/Qn4fuQIsnx580TPcbhXvuiiN1eVjatBhP3Ij8tGMyKtY+IWgCenmGvYZyCU9K47inDd+Xph6QENIZDz4S0+qImQdB1GsClyqZWOKHIcZK2hUiA5xcpVnTNSmS8s+6fITUcY1Tbavb4P3Hz1QXTBqq/119ImyeXba/zalz8/aVkFgpF2CM4/mzk3qxiXGvOJRyruG0RE2eLvjAoZt8mNyQV2eg50RsmcKz5pc+46vWa3l9byKzc4TElReNjrUXMoRvPLh8iyvj+FLpaopscBjJNRw7iQHWj995KfH+MVNGddD/1yKS9MXNei5SV0Qj27V4qQ3rekmTFU5rU/O84 QWqjrxpQ e5rieOWKQ3Imonoat39WP7LM31i7/Uj+LZ9IL0oki6sy8Tey/DWK5T2SM9jXAmZ4jVyMSQXyOOQaKVIP17AZL4uOnwAhk1wNxX+JnrbrUB5TK9tlNpyJYWCQ4kYdocLvLRE5bzeJH71owR3D+EaVnKTLZa/l9SnwAH7e6NkGltNZJK2c= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There are two problems in terms of using kexec() to boot to a new kernel when the old kernel has enabled TDX: 1) Part of the memory pages are still TDX private pages; 2) There might be dirty cachelines associated with TDX private pages. The first problem doesn't matter. KeyID 0 doesn't have integrity check. Even the new kernel wants to use any non-zero KeyID, it needs to convert the memory to that KeyID and such conversion would work from any KeyID. However the old kernel needs to guarantee there's no dirty cacheline left behind before booting to the new kernel to avoid silent corruption from later cacheline writeback (Intel hardware doesn't guarantee cache coherency across different KeyIDs). There are two things that the old kernel needs to do to achieve that: 1) Stop accessing TDX private memory mappings: a. Stop making TDX module SEAMCALLs (TDX global KeyID); b. Stop TDX guests from running (per-guest TDX KeyID). 2) Flush any cachelines from previous TDX private KeyID writes. For 2), use wbinvd() to flush cache in stop_this_cpu(), following SME support. And in this way 1) happens for free as there's no TDX activity between wbinvd() and the native_halt(). Theoretically, cache flush is only needed when the TDX module has been initialized. However initializing the TDX module is done on demand at runtime, and it takes a mutex to read the module status. Just check whether TDX is enabled by the BIOS instead to flush cache. Signed-off-by: Kai Huang Reviewed-by: Isaku Yamahata --- v8 -> v9: - Various changelog enhancement and fix (Dave). - Improved comment (Dave). v7 -> v8: - Changelog: - Removed "leave TDX module open" part due to shut down patch has been removed. v6 -> v7: - Improved changelog to explain why don't convert TDX private pages back to normal. --- arch/x86/kernel/process.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 40d156a31676..5876dda412c7 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -765,8 +765,13 @@ void __noreturn stop_this_cpu(void *dummy) * * Test the CPUID bit directly because the machine might've cleared * X86_FEATURE_SME due to cmdline options. + * + * The TDX module or guests might have left dirty cachelines + * behind. Flush them to avoid corruption from later writeback. + * Note that this flushes on all systems where TDX is possible, + * but does not actually check that TDX was in use. */ - if (cpuid_eax(0x8000001f) & BIT(0)) + if (cpuid_eax(0x8000001f) & BIT(0) || platform_tdx_enabled()) native_wbinvd(); for (;;) { /*