From patchwork Fri Dec 9 06:52:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kai Huang X-Patchwork-Id: 13069309 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B0BAC4332F for ; Fri, 9 Dec 2022 06:54:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CD6278E0005; Fri, 9 Dec 2022 01:54:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CAE228E0001; Fri, 9 Dec 2022 01:54:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B75218E0005; Fri, 9 Dec 2022 01:54:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id A9F818E0001 for ; Fri, 9 Dec 2022 01:54:21 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 753DE16104E for ; Fri, 9 Dec 2022 06:54:21 +0000 (UTC) X-FDA: 80221853922.17.B8CB25D Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf20.hostedemail.com (Postfix) with ESMTP id 8AAAE1C000B for ; Fri, 9 Dec 2022 06:54:19 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=QYdr79fO; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf20.hostedemail.com: domain of kai.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=kai.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670568859; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sWrTCre9/FY4P2WcWtRQS0wQXLwPfX1tsCUO49mytOo=; b=RWdJFeNGRnYxEwlW4tESmscBNqeh9mFKISqX1Yl8c75FNEsPrbw6ABSNR8pcVrq4GjfgRx rN6zXmIGk0plFntgg2jelk7lW70H41QQtUzDkTQRvAEgTV3a4eJGCm6+FCSxVNjMpieeWF p0VWUqf4+IGHck4ttB7yUdhULYNtrZA= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=QYdr79fO; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf20.hostedemail.com: domain of kai.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=kai.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670568859; a=rsa-sha256; cv=none; b=hHMFVgnrzSjp+sAEffOCLff7EyH/m0uImZ7RtcIDK3xzZhEbhjyfGqcaDkqRFgif5NBv6C CfUGHmqBkZuLRqXYF56LJg79QqixY/F0njYn1IGKoLPcdmUaqvO8DO4KCrD/Q0YNFSCPte /FJDyr1m55VXqR/qNIqRa23eYynxRzs= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670568859; x=1702104859; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=47FUNXOBhS2NG9KMSmMIewrFFmTIx7+CnlvTMHcGc9w=; b=QYdr79fOZcfFSBcPCjNfJ6G3OyMicUJ5sdup4aaD5GmyVJpV9jjP4e6I VFbKulWVsMqEONLbU8YpUKotgefOik6JZNoF1OVjNJIqyJr/ZwyZHA9re OUr/CyUn0UtgLNNbe3zW9Or49D8blEpwvKIxRfoPrEbTV/PLZtVsRHy+K 3ixUUwKfXk7CLANgf+yAhVaJH++ZVfJvHZhqlx4lcN5NKT9KhHxcGNra0 BxzNOexuFXjf0yOpwRLKK4JgFnyFenk72ppNUdCAHqTSkQXypf8jIZRa+ +mWExjMvUQhFCq6oR1Teq/slZOBESx1q3mHeQpzKuCqonGO//AyZWbNmu g==; X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="318551538" X-IronPort-AV: E=Sophos;i="5.96,230,1665471600"; d="scan'208";a="318551538" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 22:54:19 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="679837169" X-IronPort-AV: E=Sophos;i="5.96,230,1665471600"; d="scan'208";a="679837169" Received: from omiramon-mobl1.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.212.28.82]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 22:54:14 -0800 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: linux-mm@kvack.org, dave.hansen@intel.com, peterz@infradead.org, tglx@linutronix.de, seanjc@google.com, pbonzini@redhat.com, dan.j.williams@intel.com, rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com, ying.huang@intel.com, reinette.chatre@intel.com, len.brown@intel.com, tony.luck@intel.com, ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com, sagis@google.com, imammedo@redhat.com, kai.huang@intel.com Subject: [PATCH v8 15/16] x86/virt/tdx: Flush cache in kexec() when TDX is enabled Date: Fri, 9 Dec 2022 19:52:36 +1300 Message-Id: X-Mailer: git-send-email 2.38.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Queue-Id: 8AAAE1C000B X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: tnccaeu6offzg65fprmiy381qq9tkqy1 X-HE-Tag: 1670568859-994981 X-HE-Meta: U2FsdGVkX18atmZ2FDvoXay6K6S/0RJHJWMrqv9vqM03mM4nQWgdf61vWQt/bWfrKpCEQm+A3cfc8JrxB2wIlupMGvKDfynvHdTc2RZfyyjP0Mxv6rSIsP6YljOSZxBLt9nC5KHLvYIRaioz4YrHAVLKym7U6KN1O4Xu92S58AvGMib+5QyyA5RdkDECv72O+uadQ1VlbMc3tMCNmNO25o6+V0bZhEz5vI0/KMOFZQ6y77l6q2YrUkMg6UIWyLzV9eRWR6Q3FqUs4uh+cwskXC0mO7Z1HryumQ6KZsCVxPbdjOAcn+PlmFjcZLVCYMU8Tsd3dolkRZ0qgs/p3Le81wmGW1NV+aSkc2qJreO7u+4fRkt0/KD4vw2WzbaihKC6YTTCCxo9Z8gRJUmuP/K/XstifzbtZajpy7TSwe/L1Zwj0iSVVUNXB+A6+zTc6yAzPQ7ZIaY9aN6/S9rp2JEjCcSp9gBU41IrIOY9iWzqdfY/fvZ2KoYoq8eBsY/MME0sBRryBmhnKexU9FMXMa9WA5kt9AMT6akZMTWsWUiNmE6trctmR4v/HVKUu5hv82iluE3EmeG7+VMwdGwirROfgdndcBhb76d5tGgiMBDIct7F8MtJECnxF6J1TWlov2A4FbV7lAAMvC3mvSzoOU+BVu6JhiNPf5CZu5kUaDGmYtcVF9k/rV+e52MruNN8CBM9vfHl5ArBCihS2YLRwC19KA2CYukYHp47R+pJJMxnZfxcpr99tlIaVhji5art3Jt+HSY9Qo9QweknXTpEJDstUitBZG129oDD0XWgoiEzBeJQ2B07bbA+mOVxWxJWqdYT29/PNUJG8SyqBe3CoEaIGpUkR1s2fw1eeF2gFCy/hLZdX9BB5y6S/LTiX6MI9sb6X0t83RfjiY0EWJApOudLHw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There are two problems in terms of using kexec() to boot to a new kernel when the old kernel has enabled TDX: 1) Part of the memory pages are still TDX private pages (i.e. metadata used by the TDX module, and any TDX guest memory if kexec() happens when there's any TDX guest alive). 2) There might be dirty cachelines associated with TDX private pages. Because the hardware doesn't guarantee cache coherency among different KeyIDs, the old kernel needs to flush cache (of those TDX private pages) before booting to the new kernel. Also, reading TDX private page using any shared non-TDX KeyID with integrity-check enabled can trigger #MC. Therefore ideally, the kernel should convert all TDX private pages back to normal before booting to the new kernel. However, this implementation doesn't convert TDX private pages back to normal in kexec() because of below considerations: 1) Neither the kernel nor the TDX module has existing infrastructure to track which pages are TDX private pages. 2) The number of TDX private pages can be large, and converting all of them (cache flush + using MOVDIR64B to clear the page) in kexec() can be time consuming. 3) The new kernel will almost only use KeyID 0 to access memory. KeyID 0 doesn't support integrity-check, so it's OK. 4) The kernel doesn't (and may never) support MKTME. If any 3rd party kernel ever supports MKTME, it can/should do MOVDIR64B to clear the page with the new MKTME KeyID (just like TDX does) before using it. Therefore, this implementation just flushes cache to make sure there are no stale dirty cachelines associated with any TDX private KeyIDs before booting to the new kernel, otherwise they may silently corrupt the new kernel. Following SME support, use wbinvd() to flush cache in stop_this_cpu(). Theoretically, cache flush is only needed when the TDX module has been initialized. However initializing the TDX module is done on demand at runtime, and it takes a mutex to read the module status. Just check whether TDX is enabled by BIOS instead to flush cache. Reviewed-by: Isaku Yamahata Signed-off-by: Kai Huang --- v7 -> v8: - Changelog: - Removed "leave TDX module open" part due to shut down patch has been removed. v6 -> v7: - Improved changelog to explain why don't convert TDX private pages back to normal. --- arch/x86/kernel/process.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index c21b7347a26d..0cc84977dc62 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -765,8 +765,14 @@ void __noreturn stop_this_cpu(void *dummy) * * Test the CPUID bit directly because the machine might've cleared * X86_FEATURE_SME due to cmdline options. + * + * Similar to SME, if the TDX module is ever initialized, the + * cachelines associated with any TDX private KeyID must be flushed + * before transiting to the new kernel. The TDX module is initialized + * on demand, and it takes the mutex to read its status. Just check + * whether TDX is enabled by BIOS instead to flush cache. */ - if (cpuid_eax(0x8000001f) & BIT(0)) + if (cpuid_eax(0x8000001f) & BIT(0) || platform_tdx_enabled()) native_wbinvd(); for (;;) { /*