From patchwork Fri Apr 16 15:40:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 12207993 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E3CDC43461 for ; Fri, 16 Apr 2021 15:41:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4746B613A9 for ; Fri, 16 Apr 2021 15:41:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343905AbhDPPmD (ORCPT ); Fri, 16 Apr 2021 11:42:03 -0400 Received: from mga04.intel.com ([192.55.52.120]:11046 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235551AbhDPPmB (ORCPT ); Fri, 16 Apr 2021 11:42:01 -0400 IronPort-SDR: 35tJIDHUOoUC+co8OdFG12X5kKl2dWd3dLemK635tu4gcajP2toI36zcqLX29FMz7sBz1LAjZ2 j+zmWxAgpumg== X-IronPort-AV: E=McAfee;i="6200,9189,9956"; a="192931129" X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="192931129" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Apr 2021 08:41:36 -0700 IronPort-SDR: 6IctWwvMhRST79zs3UT7X2lmhiZ2l9BRZTh8PbQxjhQJynga4+YurvugLpc+mbLJQ/BneIdPz3 BgBVQ2QiXENA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="461998771" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga001.jf.intel.com with ESMTP; 16 Apr 2021 08:41:32 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id CAD0EBA; Fri, 16 Apr 2021 18:41:49 +0300 (EEST) From: "Kirill A. Shutemov" To: Dave Hansen , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Jim Mattson Cc: David Rientjes , "Edgecombe, Rick P" , "Kleen, Andi" , "Yamahata, Isaku" , Erdem Aktas , Steve Rutherford , Peter Gonda , David Hildenbrand , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [RFCv2 01/13] x86/mm: Move force_dma_unencrypted() to common code Date: Fri, 16 Apr 2021 18:40:54 +0300 Message-Id: <20210416154106.23721-2-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> References: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org force_dma_unencrypted() has to return true for KVM guest with the memory protected enabled. Move it out of AMD SME code. Introduce new config option X86_MEM_ENCRYPT_COMMON that has to be selected by all x86 memory encryption features. This is preparation for the following patches. Signed-off-by: Kirill A. Shutemov --- arch/x86/Kconfig | 7 +++++- arch/x86/include/asm/io.h | 4 +++- arch/x86/mm/Makefile | 2 ++ arch/x86/mm/mem_encrypt.c | 30 ------------------------- arch/x86/mm/mem_encrypt_common.c | 38 ++++++++++++++++++++++++++++++++ 5 files changed, 49 insertions(+), 32 deletions(-) create mode 100644 arch/x86/mm/mem_encrypt_common.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 21f851179ff0..2b4ce1722dbd 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1520,14 +1520,19 @@ config X86_CPA_STATISTICS helps to determine the effectiveness of preserving large and huge page mappings when mapping protections are changed. +config X86_MEM_ENCRYPT_COMMON + select ARCH_HAS_FORCE_DMA_UNENCRYPTED + select DYNAMIC_PHYSICAL_MASK + def_bool n + config AMD_MEM_ENCRYPT bool "AMD Secure Memory Encryption (SME) support" depends on X86_64 && CPU_SUP_AMD select DMA_COHERENT_POOL - select DYNAMIC_PHYSICAL_MASK select ARCH_USE_MEMREMAP_PROT select ARCH_HAS_FORCE_DMA_UNENCRYPTED select INSTRUCTION_DECODER + select X86_MEM_ENCRYPT_COMMON help Say yes to enable support for the encryption of system memory. This requires an AMD processor that supports Secure Memory diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h index d726459d08e5..6dc51b31cb0e 100644 --- a/arch/x86/include/asm/io.h +++ b/arch/x86/include/asm/io.h @@ -256,10 +256,12 @@ static inline void slow_down_io(void) #endif -#ifdef CONFIG_AMD_MEM_ENCRYPT #include extern struct static_key_false sev_enable_key; + +#ifdef CONFIG_AMD_MEM_ENCRYPT + static inline bool sev_key_active(void) { return static_branch_unlikely(&sev_enable_key); diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 5864219221ca..b31cb52bf1bd 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -52,6 +52,8 @@ obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o obj-$(CONFIG_PAGE_TABLE_ISOLATION) += pti.o +obj-$(CONFIG_X86_MEM_ENCRYPT_COMMON) += mem_encrypt_common.o + obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_identity.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_boot.o diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c index c3d5f0236f35..9ca477b9b8ba 100644 --- a/arch/x86/mm/mem_encrypt.c +++ b/arch/x86/mm/mem_encrypt.c @@ -15,10 +15,6 @@ #include #include #include -#include -#include -#include -#include #include #include @@ -390,32 +386,6 @@ bool noinstr sev_es_active(void) return sev_status & MSR_AMD64_SEV_ES_ENABLED; } -/* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */ -bool force_dma_unencrypted(struct device *dev) -{ - /* - * For SEV, all DMA must be to unencrypted addresses. - */ - if (sev_active()) - return true; - - /* - * For SME, all DMA must be to unencrypted addresses if the - * device does not support DMA to addresses that include the - * encryption mask. - */ - if (sme_active()) { - u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask)); - u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask, - dev->bus_dma_limit); - - if (dma_dev_mask <= dma_enc_mask) - return true; - } - - return false; -} - void __init mem_encrypt_free_decrypted_mem(void) { unsigned long vaddr, vaddr_end, npages; diff --git a/arch/x86/mm/mem_encrypt_common.c b/arch/x86/mm/mem_encrypt_common.c new file mode 100644 index 000000000000..dd791352f73f --- /dev/null +++ b/arch/x86/mm/mem_encrypt_common.c @@ -0,0 +1,38 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * AMD Memory Encryption Support + * + * Copyright (C) 2016 Advanced Micro Devices, Inc. + * + * Author: Tom Lendacky + */ + +#include +#include +#include + +/* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */ +bool force_dma_unencrypted(struct device *dev) +{ + /* + * For SEV, all DMA must be to unencrypted/shared addresses. + */ + if (sev_active()) + return true; + + /* + * For SME, all DMA must be to unencrypted addresses if the + * device does not support DMA to addresses that include the + * encryption mask. + */ + if (sme_active()) { + u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask)); + u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask, + dev->bus_dma_limit); + + if (dma_dev_mask <= dma_enc_mask) + return true; + } + + return false; +} From patchwork Fri Apr 16 15:40:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 12207991 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A83C8C43460 for ; Fri, 16 Apr 2021 15:41:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 878B4613A9 for ; Fri, 16 Apr 2021 15:41:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343893AbhDPPmD (ORCPT ); Fri, 16 Apr 2021 11:42:03 -0400 Received: from mga12.intel.com ([192.55.52.136]:63662 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235631AbhDPPmB (ORCPT ); Fri, 16 Apr 2021 11:42:01 -0400 IronPort-SDR: tpURSZqO87L9f1BQdEPWLLuhrpIecJPnyhaj8uchmjHy4U1KJUqpY0HnqZQfAFrkBb8CiyoL/W tVrV0ntgNRgA== X-IronPort-AV: E=McAfee;i="6200,9189,9956"; a="174550436" X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="174550436" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Apr 2021 08:41:36 -0700 IronPort-SDR: ZXxLiKI5NW+Jmqq7rMd7tPVDoK6IYF3qIiPH2QvjaxnQsltHLq7YI2g8gn1spdzGD9lsUe7Bxa BjCz3ty+CJIw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="453378436" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga002.fm.intel.com with ESMTP; 16 Apr 2021 08:41:32 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id D58D2147; Fri, 16 Apr 2021 18:41:49 +0300 (EEST) From: "Kirill A. Shutemov" To: Dave Hansen , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Jim Mattson Cc: David Rientjes , "Edgecombe, Rick P" , "Kleen, Andi" , "Yamahata, Isaku" , Erdem Aktas , Steve Rutherford , Peter Gonda , David Hildenbrand , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [RFCv2 02/13] x86/kvm: Introduce KVM memory protection feature Date: Fri, 16 Apr 2021 18:40:55 +0300 Message-Id: <20210416154106.23721-3-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> References: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Provide basic helpers, KVM_FEATURE, CPUID flag and a hypercall. Host side doesn't provide the feature yet, so it is a dead code for now. Signed-off-by: Kirill A. Shutemov --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/kvm_para.h | 5 +++++ arch/x86/include/uapi/asm/kvm_para.h | 3 ++- arch/x86/kernel/kvm.c | 17 +++++++++++++++++ include/uapi/linux/kvm_para.h | 3 ++- 5 files changed, 27 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 84b887825f12..d8f3d2619913 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -238,6 +238,7 @@ #define X86_FEATURE_VMW_VMMCALL ( 8*32+19) /* "" VMware prefers VMMCALL hypercall instruction */ #define X86_FEATURE_SEV_ES ( 8*32+20) /* AMD Secure Encrypted Virtualization - Encrypted State */ #define X86_FEATURE_VM_PAGE_FLUSH ( 8*32+21) /* "" VM Page Flush MSR is supported */ +#define X86_FEATURE_KVM_MEM_PROTECTED ( 8*32+22) /* "" KVM memory protection extension */ /* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */ #define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/ diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 338119852512..74aea18f3130 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -11,11 +11,16 @@ extern void kvmclock_init(void); #ifdef CONFIG_KVM_GUEST bool kvm_check_and_clear_guest_paused(void); +bool kvm_mem_protected(void); #else static inline bool kvm_check_and_clear_guest_paused(void) { return false; } +static inline bool kvm_mem_protected(void) +{ + return false; +} #endif /* CONFIG_KVM_GUEST */ #define KVM_HYPERCALL \ diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 950afebfba88..8d32c41861c9 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -28,11 +28,12 @@ #define KVM_FEATURE_PV_UNHALT 7 #define KVM_FEATURE_PV_TLB_FLUSH 9 #define KVM_FEATURE_ASYNC_PF_VMEXIT 10 -#define KVM_FEATURE_PV_SEND_IPI 11 +#define KVM_FEATURE_PV_SEND_IPI 11 #define KVM_FEATURE_POLL_CONTROL 12 #define KVM_FEATURE_PV_SCHED_YIELD 13 #define KVM_FEATURE_ASYNC_PF_INT 14 #define KVM_FEATURE_MSI_EXT_DEST_ID 15 +#define KVM_FEATURE_MEM_PROTECTED 16 #define KVM_HINTS_REALTIME 0 diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 5e78e01ca3b4..aed6034fcac1 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -39,6 +39,13 @@ #include #include +static bool mem_protected; + +bool kvm_mem_protected(void) +{ + return mem_protected; +} + DEFINE_STATIC_KEY_FALSE(kvm_async_pf_enabled); static int kvmapf = 1; @@ -749,6 +756,16 @@ static void __init kvm_init_platform(void) { kvmclock_init(); x86_platform.apic_post_init = kvm_apic_init; + + if (kvm_para_has_feature(KVM_FEATURE_MEM_PROTECTED)) { + if (kvm_hypercall0(KVM_HC_ENABLE_MEM_PROTECTED)) { + panic("Failed to enable KVM memory protection"); + } + + pr_info("KVM memory protection enabled\n"); + mem_protected = true; + setup_force_cpu_cap(X86_FEATURE_KVM_MEM_PROTECTED); + } } #if defined(CONFIG_AMD_MEM_ENCRYPT) diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h index 8b86609849b9..1a216f32e572 100644 --- a/include/uapi/linux/kvm_para.h +++ b/include/uapi/linux/kvm_para.h @@ -27,8 +27,9 @@ #define KVM_HC_MIPS_EXIT_VM 7 #define KVM_HC_MIPS_CONSOLE_OUTPUT 8 #define KVM_HC_CLOCK_PAIRING 9 -#define KVM_HC_SEND_IPI 10 +#define KVM_HC_SEND_IPI 10 #define KVM_HC_SCHED_YIELD 11 +#define KVM_HC_ENABLE_MEM_PROTECTED 12 /* * hypercalls use architecture specific From patchwork Fri Apr 16 15:40:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 12207995 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F41F3C43600 for ; Fri, 16 Apr 2021 15:41:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DDEA9613B0 for ; Fri, 16 Apr 2021 15:41:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343927AbhDPPmE (ORCPT ); Fri, 16 Apr 2021 11:42:04 -0400 Received: from mga12.intel.com ([192.55.52.136]:63662 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243449AbhDPPmC (ORCPT ); Fri, 16 Apr 2021 11:42:02 -0400 IronPort-SDR: MYQ5WHyWqi4eq+RLtc6h6T4PyD0VKMaSLilimbbgxNgZZutgsEAI09n4yxvFYXhWhMHHgxmLYj 0LnP85YfLcrA== X-IronPort-AV: E=McAfee;i="6200,9189,9956"; a="174550437" X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="174550437" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Apr 2021 08:41:37 -0700 IronPort-SDR: pd4rjOuDB7GQ8SDPFXTAOT408W4GzC6bMM/6th6kYTROmgUGT42oXh6q1tqsLOgM51ZsWI2pcP 85QBOIF9XVfw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="453378439" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga002.fm.intel.com with ESMTP; 16 Apr 2021 08:41:32 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id DF99D1D2; Fri, 16 Apr 2021 18:41:49 +0300 (EEST) From: "Kirill A. Shutemov" To: Dave Hansen , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Jim Mattson Cc: David Rientjes , "Edgecombe, Rick P" , "Kleen, Andi" , "Yamahata, Isaku" , Erdem Aktas , Steve Rutherford , Peter Gonda , David Hildenbrand , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [RFCv2 03/13] x86/kvm: Make DMA pages shared Date: Fri, 16 Apr 2021 18:40:56 +0300 Message-Id: <20210416154106.23721-4-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> References: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Make force_dma_unencrypted() return true for KVM to get DMA pages mapped as shared. __set_memory_enc_dec() now informs the host via hypercall if the state of the page has changed from shared to private or back. Signed-off-by: Kirill A. Shutemov --- arch/x86/Kconfig | 1 + arch/x86/mm/mem_encrypt_common.c | 5 +++-- arch/x86/mm/pat/set_memory.c | 10 ++++++++++ include/uapi/linux/kvm_para.h | 1 + 4 files changed, 15 insertions(+), 2 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 2b4ce1722dbd..d197b3beb904 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -811,6 +811,7 @@ config KVM_GUEST select PARAVIRT_CLOCK select ARCH_CPUIDLE_HALTPOLL select X86_HV_CALLBACK_VECTOR + select X86_MEM_ENCRYPT_COMMON default y help This option enables various optimizations for running under the KVM diff --git a/arch/x86/mm/mem_encrypt_common.c b/arch/x86/mm/mem_encrypt_common.c index dd791352f73f..6bf0718bb72a 100644 --- a/arch/x86/mm/mem_encrypt_common.c +++ b/arch/x86/mm/mem_encrypt_common.c @@ -10,14 +10,15 @@ #include #include #include +#include /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */ bool force_dma_unencrypted(struct device *dev) { /* - * For SEV, all DMA must be to unencrypted/shared addresses. + * For SEV and KVM, all DMA must be to unencrypted/shared addresses. */ - if (sev_active()) + if (sev_active() || kvm_mem_protected()) return true; /* diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 16f878c26667..4b312d80033d 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include @@ -1977,6 +1978,15 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc) struct cpa_data cpa; int ret; + if (kvm_mem_protected()) { + /* TODO: Unsharing memory back */ + if (WARN_ON_ONCE(enc)) + return -ENOSYS; + + return kvm_hypercall2(KVM_HC_MEM_SHARE, + __pa(addr) >> PAGE_SHIFT, numpages); + } + /* Nothing to do if memory encryption is not active */ if (!mem_encrypt_active()) return 0; diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h index 1a216f32e572..09d36683ee0a 100644 --- a/include/uapi/linux/kvm_para.h +++ b/include/uapi/linux/kvm_para.h @@ -30,6 +30,7 @@ #define KVM_HC_SEND_IPI 10 #define KVM_HC_SCHED_YIELD 11 #define KVM_HC_ENABLE_MEM_PROTECTED 12 +#define KVM_HC_MEM_SHARE 13 /* * hypercalls use architecture specific From patchwork Fri Apr 16 15:40:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 12207997 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C5E6C43461 for ; Fri, 16 Apr 2021 15:41:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 08D03613C5 for ; Fri, 16 Apr 2021 15:41:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343950AbhDPPmG (ORCPT ); Fri, 16 Apr 2021 11:42:06 -0400 Received: from mga03.intel.com ([134.134.136.65]:28316 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243404AbhDPPmC (ORCPT ); Fri, 16 Apr 2021 11:42:02 -0400 IronPort-SDR: lFIqVKb0g4aDqdFCXfcwOYCOMmtxoXQDBNnBB5WdyzoDxRSM5R9pvdWkgu5fqvjVU7qhEFpJys xImq/apc3jyg== X-IronPort-AV: E=McAfee;i="6200,9189,9956"; a="195082411" X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="195082411" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Apr 2021 08:41:36 -0700 IronPort-SDR: +Fi9GL9MKK5YxKTt0mZh2nWd6EnMe3+HzzehYBQwwfQpzsyfJKfda3/Mo32rF7rxY0yk2+D0Ay pZL4if72nuGQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="399940018" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga002.jf.intel.com with ESMTP; 16 Apr 2021 08:41:32 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id E9C82202; Fri, 16 Apr 2021 18:41:49 +0300 (EEST) From: "Kirill A. Shutemov" To: Dave Hansen , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Jim Mattson Cc: David Rientjes , "Edgecombe, Rick P" , "Kleen, Andi" , "Yamahata, Isaku" , Erdem Aktas , Steve Rutherford , Peter Gonda , David Hildenbrand , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [RFCv2 04/13] x86/kvm: Use bounce buffers for KVM memory protection Date: Fri, 16 Apr 2021 18:40:57 +0300 Message-Id: <20210416154106.23721-5-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> References: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Mirror SEV, use SWIOTLB always if KVM memory protection is enabled. Signed-off-by: Kirill A. Shutemov --- arch/x86/Kconfig | 1 + arch/x86/include/asm/mem_encrypt.h | 7 +++-- arch/x86/kernel/kvm.c | 2 ++ arch/x86/kernel/pci-swiotlb.c | 3 +- arch/x86/mm/mem_encrypt.c | 44 --------------------------- arch/x86/mm/mem_encrypt_common.c | 48 ++++++++++++++++++++++++++++++ 6 files changed, 57 insertions(+), 48 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index d197b3beb904..c51d14db5620 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -812,6 +812,7 @@ config KVM_GUEST select ARCH_CPUIDLE_HALTPOLL select X86_HV_CALLBACK_VECTOR select X86_MEM_ENCRYPT_COMMON + select SWIOTLB default y help This option enables various optimizations for running under the KVM diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h index 31c4df123aa0..a748b30c2f23 100644 --- a/arch/x86/include/asm/mem_encrypt.h +++ b/arch/x86/include/asm/mem_encrypt.h @@ -47,10 +47,8 @@ int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size); void __init mem_encrypt_free_decrypted_mem(void); -/* Architecture __weak replacement functions */ -void __init mem_encrypt_init(void); - void __init sev_es_init_vc_handling(void); + bool sme_active(void); bool sev_active(void); bool sev_es_active(void); @@ -91,6 +89,9 @@ static inline void mem_encrypt_free_decrypted_mem(void) { } #endif /* CONFIG_AMD_MEM_ENCRYPT */ +/* Architecture __weak replacement functions */ +void __init mem_encrypt_init(void); + /* * The __sme_pa() and __sme_pa_nodebug() macros are meant for use when * writing to or comparing values from the cr3 register. Having the diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index aed6034fcac1..ba179f5ca198 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include #include @@ -765,6 +766,7 @@ static void __init kvm_init_platform(void) pr_info("KVM memory protection enabled\n"); mem_protected = true; setup_force_cpu_cap(X86_FEATURE_KVM_MEM_PROTECTED); + swiotlb_force = SWIOTLB_FORCE; } } diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c index c2cfa5e7c152..814060a6ceb0 100644 --- a/arch/x86/kernel/pci-swiotlb.c +++ b/arch/x86/kernel/pci-swiotlb.c @@ -13,6 +13,7 @@ #include #include #include +#include int swiotlb __read_mostly; @@ -49,7 +50,7 @@ int __init pci_swiotlb_detect_4gb(void) * buffers are allocated and used for devices that do not support * the addressing range required for the encryption mask. */ - if (sme_active()) + if (sme_active() || kvm_mem_protected()) swiotlb = 1; return swiotlb; diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c index 9ca477b9b8ba..3478f20fb46f 100644 --- a/arch/x86/mm/mem_encrypt.c +++ b/arch/x86/mm/mem_encrypt.c @@ -409,47 +409,3 @@ void __init mem_encrypt_free_decrypted_mem(void) free_init_pages("unused decrypted", vaddr, vaddr_end); } - -static void print_mem_encrypt_feature_info(void) -{ - pr_info("AMD Memory Encryption Features active:"); - - /* Secure Memory Encryption */ - if (sme_active()) { - /* - * SME is mutually exclusive with any of the SEV - * features below. - */ - pr_cont(" SME\n"); - return; - } - - /* Secure Encrypted Virtualization */ - if (sev_active()) - pr_cont(" SEV"); - - /* Encrypted Register State */ - if (sev_es_active()) - pr_cont(" SEV-ES"); - - pr_cont("\n"); -} - -/* Architecture __weak replacement functions */ -void __init mem_encrypt_init(void) -{ - if (!sme_me_mask) - return; - - /* Call into SWIOTLB to update the SWIOTLB DMA buffers */ - swiotlb_update_mem_attributes(); - - /* - * With SEV, we need to unroll the rep string I/O instructions. - */ - if (sev_active()) - static_branch_enable(&sev_enable_key); - - print_mem_encrypt_feature_info(); -} - diff --git a/arch/x86/mm/mem_encrypt_common.c b/arch/x86/mm/mem_encrypt_common.c index 6bf0718bb72a..351b77361a5d 100644 --- a/arch/x86/mm/mem_encrypt_common.c +++ b/arch/x86/mm/mem_encrypt_common.c @@ -11,6 +11,7 @@ #include #include #include +#include /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */ bool force_dma_unencrypted(struct device *dev) @@ -37,3 +38,50 @@ bool force_dma_unencrypted(struct device *dev) return false; } + +static void print_mem_encrypt_feature_info(void) +{ + if (kvm_mem_protected()) { + pr_info("KVM memory protection enabled\n"); + return; + } + + pr_info("AMD Memory Encryption Features active:"); + + /* Secure Memory Encryption */ + if (sme_active()) { + /* + * SME is mutually exclusive with any of the SEV + * features below. + */ + pr_cont(" SME\n"); + return; + } + + /* Secure Encrypted Virtualization */ + if (sev_active()) + pr_cont(" SEV"); + + /* Encrypted Register State */ + if (sev_es_active()) + pr_cont(" SEV-ES"); + + pr_cont("\n"); +} + +void __init mem_encrypt_init(void) +{ + if (!sme_me_mask && !kvm_mem_protected()) + return; + + /* Call into SWIOTLB to update the SWIOTLB DMA buffers */ + swiotlb_update_mem_attributes(); + + /* + * With SEV, we need to unroll the rep string I/O instructions. + */ + if (sev_active()) + static_branch_enable(&sev_enable_key); + + print_mem_encrypt_feature_info(); +} From patchwork Fri Apr 16 15:40:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 12208001 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C77F7C43470 for ; Fri, 16 Apr 2021 15:41:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AD573613B4 for ; Fri, 16 Apr 2021 15:41:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243404AbhDPPmM (ORCPT ); Fri, 16 Apr 2021 11:42:12 -0400 Received: from mga03.intel.com ([134.134.136.65]:28335 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343976AbhDPPmJ (ORCPT ); Fri, 16 Apr 2021 11:42:09 -0400 IronPort-SDR: vd+KVI7AyPpE0REhN8NACwSZDg0YLMktVidU4geP6OQ4tl+TfYVKcHkiqulWZsN6N7eoe/MVML 6HC1ptSRRsiw== X-IronPort-AV: E=McAfee;i="6200,9189,9956"; a="195082429" X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="195082429" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Apr 2021 08:41:44 -0700 IronPort-SDR: 3HYELYXhQ0Mx8RIPOamGTgbBRpB0AncsovXkZNC/8m2JSdrcRS+o6IlhQliG1Z6k8LBgLQ4zTa Vj7+eJtHfebw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="453378493" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga002.fm.intel.com with ESMTP; 16 Apr 2021 08:41:37 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id F3CE424D; Fri, 16 Apr 2021 18:41:49 +0300 (EEST) From: "Kirill A. Shutemov" To: Dave Hansen , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Jim Mattson Cc: David Rientjes , "Edgecombe, Rick P" , "Kleen, Andi" , "Yamahata, Isaku" , Erdem Aktas , Steve Rutherford , Peter Gonda , David Hildenbrand , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [RFCv2 05/13] x86/kvmclock: Share hvclock memory with the host Date: Fri, 16 Apr 2021 18:40:58 +0300 Message-Id: <20210416154106.23721-6-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> References: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org hvclock is shared between the guest and the hypervisor. It has to be accessible by host. Signed-off-by: Kirill A. Shutemov --- arch/x86/kernel/kvmclock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index aa593743acf6..3d004b278dba 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -252,7 +252,7 @@ static void __init kvmclock_init_mem(void) * hvclock is shared between the guest and the hypervisor, must * be mapped decrypted. */ - if (sev_active()) { + if (sev_active() || kvm_mem_protected()) { r = set_memory_decrypted((unsigned long) hvclock_mem, 1UL << order); if (r) { From patchwork Fri Apr 16 15:40:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 12208005 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB59BC433B4 for ; Fri, 16 Apr 2021 15:41:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9078C6121F for ; Fri, 16 Apr 2021 15:41:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344048AbhDPPmQ (ORCPT ); Fri, 16 Apr 2021 11:42:16 -0400 Received: from mga09.intel.com ([134.134.136.24]:4092 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343989AbhDPPmL (ORCPT ); Fri, 16 Apr 2021 11:42:11 -0400 IronPort-SDR: E9YfRotjPqIUZrKWWV/D64DLh8dwSCmaGzDza6oSmpdTNUauWpp8/Exd1T9XjguHtJEZIEQkFc tPMJ9eFMHzwQ== X-IronPort-AV: E=McAfee;i="6200,9189,9956"; a="195169145" X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="195169145" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Apr 2021 08:41:42 -0700 IronPort-SDR: X85JOUNwgOD0cizzR5uDukCMiBRDxeHXnklH8hj1bphP48a0OlK0J470EynoTDSDOOq7s8sGWp THnhFtf3GCoQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="384352246" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga006.jf.intel.com with ESMTP; 16 Apr 2021 08:41:37 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 0A0DB279; Fri, 16 Apr 2021 18:41:50 +0300 (EEST) From: "Kirill A. Shutemov" To: Dave Hansen , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Jim Mattson Cc: David Rientjes , "Edgecombe, Rick P" , "Kleen, Andi" , "Yamahata, Isaku" , Erdem Aktas , Steve Rutherford , Peter Gonda , David Hildenbrand , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [RFCv2 06/13] x86/realmode: Share trampoline area if KVM memory protection enabled Date: Fri, 16 Apr 2021 18:40:59 +0300 Message-Id: <20210416154106.23721-7-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> References: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org If KVM memory protection is active, the trampoline area will need to be in shared memory. Signed-off-by: Kirill A. Shutemov --- arch/x86/realmode/init.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c index 22fda7d99159..f3b54b5da693 100644 --- a/arch/x86/realmode/init.c +++ b/arch/x86/realmode/init.c @@ -10,6 +10,7 @@ #include #include #include +#include struct real_mode_header *real_mode_header; u32 *trampoline_cr4_features; @@ -75,11 +76,11 @@ static void __init setup_real_mode(void) base = (unsigned char *)real_mode_header; /* - * If SME is active, the trampoline area will need to be in - * decrypted memory in order to bring up other processors + * If SME or KVM memory protection is active, the trampoline area will + * need to be in decrypted memory in order to bring up other processors * successfully. This is not needed for SEV. */ - if (sme_active()) + if (sme_active() || kvm_mem_protected()) set_memory_decrypted((unsigned long)base, size >> PAGE_SHIFT); memcpy(base, real_mode_blob, size); From patchwork Fri Apr 16 15:41:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 12208003 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97441C43460 for ; Fri, 16 Apr 2021 15:41:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7DB4361222 for ; Fri, 16 Apr 2021 15:41:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344033AbhDPPmO (ORCPT ); Fri, 16 Apr 2021 11:42:14 -0400 Received: from mga17.intel.com ([192.55.52.151]:10800 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343970AbhDPPmI (ORCPT ); Fri, 16 Apr 2021 11:42:08 -0400 IronPort-SDR: uWHD7F5t2HzlTcto9mPtvJcDEq9sES5hTS581oUnaruuGCDkBRGkG43VurwDmUL8y3VJxymj3J 1XinxyK2Ceyw== X-IronPort-AV: E=McAfee;i="6200,9189,9956"; a="175163936" X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="175163936" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Apr 2021 08:41:43 -0700 IronPort-SDR: lGuaSt9Jbppuno2GL+IX51ChFHPEI5DU+6ronyLCHnvyol9GOMLKa/g0sci/4syDwSpPwbuOAE o2DRvdX+WXuw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="422038783" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga007.jf.intel.com with ESMTP; 16 Apr 2021 08:41:38 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 144FD2BE; Fri, 16 Apr 2021 18:41:50 +0300 (EEST) From: "Kirill A. Shutemov" To: Dave Hansen , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Jim Mattson Cc: David Rientjes , "Edgecombe, Rick P" , "Kleen, Andi" , "Yamahata, Isaku" , Erdem Aktas , Steve Rutherford , Peter Gonda , David Hildenbrand , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [RFCv2 07/13] mm: Add hwpoison_entry_to_pfn() and hwpoison_entry_to_page() Date: Fri, 16 Apr 2021 18:41:00 +0300 Message-Id: <20210416154106.23721-8-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> References: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add helpers to convert hwpoison swap entry to pfn and page. Signed-off-by: Kirill A. Shutemov --- include/linux/swapops.h | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/include/linux/swapops.h b/include/linux/swapops.h index d9b7c9132c2f..520589b12fb3 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -323,6 +323,16 @@ static inline int is_hwpoison_entry(swp_entry_t entry) return swp_type(entry) == SWP_HWPOISON; } +static inline unsigned long hwpoison_entry_to_pfn(swp_entry_t entry) +{ + return swp_offset(entry); +} + +static inline struct page *hwpoison_entry_to_page(swp_entry_t entry) +{ + return pfn_to_page(hwpoison_entry_to_pfn(entry)); +} + static inline void num_poisoned_pages_inc(void) { atomic_long_inc(&num_poisoned_pages); @@ -345,6 +355,16 @@ static inline int is_hwpoison_entry(swp_entry_t swp) return 0; } +static inline unsigned long hwpoison_entry_to_pfn(swp_entry_t entry) +{ + return 0; +} + +static inline struct page *hwpoison_entry_to_page(swp_entry_t entry) +{ + return NULL; +} + static inline void num_poisoned_pages_inc(void) { } From patchwork Fri Apr 16 15:41:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 12207999 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3CEEC43461 for ; Fri, 16 Apr 2021 15:41:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 95B65613CE for ; Fri, 16 Apr 2021 15:41:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343964AbhDPPmL (ORCPT ); Fri, 16 Apr 2021 11:42:11 -0400 Received: from mga09.intel.com ([134.134.136.24]:4092 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343943AbhDPPmH (ORCPT ); Fri, 16 Apr 2021 11:42:07 -0400 IronPort-SDR: UnfIaZTA1WTwaplTy6pGqT4QKIXSSwzVUf6K5A1MPONKw6BU73gyEc5XP1G6rrGUCsC+nJqgMS M69UNntXbv/A== X-IronPort-AV: E=McAfee;i="6200,9189,9956"; a="195169144" X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="195169144" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Apr 2021 08:41:41 -0700 IronPort-SDR: ar7dDxzt2Oi7GIhsG5qtuISAhRHVX9LRp+f1DQ8gtF04+3xB3T4x0fabS8zuRBBAEv60tGGXg6 I/rj8v9izkvQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="383087500" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga003.jf.intel.com with ESMTP; 16 Apr 2021 08:41:37 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 1DDE82CF; Fri, 16 Apr 2021 18:41:50 +0300 (EEST) From: "Kirill A. Shutemov" To: Dave Hansen , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Jim Mattson Cc: David Rientjes , "Edgecombe, Rick P" , "Kleen, Andi" , "Yamahata, Isaku" , Erdem Aktas , Steve Rutherford , Peter Gonda , David Hildenbrand , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [RFCv2 08/13] mm/gup: Add FOLL_ALLOW_POISONED Date: Fri, 16 Apr 2021 18:41:01 +0300 Message-Id: <20210416154106.23721-9-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> References: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The new flag allows to bypass check if the page is poisoned and get reference on it. Signed-off-by: Kirill A. Shutemov --- include/linux/mm.h | 1 + mm/gup.c | 29 ++++++++++++++++++----------- 2 files changed, 19 insertions(+), 11 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index ecdf8a8cd6ae..378a94481fd1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2802,6 +2802,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ #define FOLL_PIN 0x40000 /* pages must be released via unpin_user_page */ #define FOLL_FAST_ONLY 0x80000 /* gup_fast: prevent fall-back to slow gup */ +#define FOLL_ALLOW_POISONED 0x100000 /* bypass poison check */ /* * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each diff --git a/mm/gup.c b/mm/gup.c index e4c224cd9661..dd3b79b03eb5 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -384,22 +384,29 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, ptep = pte_offset_map_lock(mm, pmd, address, &ptl); pte = *ptep; if (!pte_present(pte)) { - swp_entry_t entry; + swp_entry_t entry = pte_to_swp_entry(pte); + + if (pte_none(pte)) + goto no_page; + /* * KSM's break_ksm() relies upon recognizing a ksm page * even while it is being migrated, so for that case we * need migration_entry_wait(). */ - if (likely(!(flags & FOLL_MIGRATION))) - goto no_page; - if (pte_none(pte)) - goto no_page; - entry = pte_to_swp_entry(pte); - if (!is_migration_entry(entry)) - goto no_page; - pte_unmap_unlock(ptep, ptl); - migration_entry_wait(mm, pmd, address); - goto retry; + if (is_migration_entry(entry) && (flags & FOLL_MIGRATION)) { + pte_unmap_unlock(ptep, ptl); + migration_entry_wait(mm, pmd, address); + goto retry; + } + + if (is_hwpoison_entry(entry) && (flags & FOLL_ALLOW_POISONED)) { + page = hwpoison_entry_to_page(entry); + get_page(page); + goto out; + } + + goto no_page; } if ((flags & FOLL_NUMA) && pte_protnone(pte)) goto no_page; From patchwork Fri Apr 16 15:41:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 12208007 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49981C43462 for ; Fri, 16 Apr 2021 15:41:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 31B22613E4 for ; Fri, 16 Apr 2021 15:41:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344065AbhDPPmS (ORCPT ); Fri, 16 Apr 2021 11:42:18 -0400 Received: from mga07.intel.com ([134.134.136.100]:3339 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343961AbhDPPmK (ORCPT ); Fri, 16 Apr 2021 11:42:10 -0400 IronPort-SDR: xXrc7HYIwZTDsgwF1KUCZWZZFNKqHukzxVxjwAPQjmDg2wph8cdNfwSUYaEVXnLXkEGI5a0rwG 6kN1Wfxoa3zg== X-IronPort-AV: E=McAfee;i="6200,9189,9956"; a="259013598" X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="259013598" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Apr 2021 08:41:43 -0700 IronPort-SDR: smzvmzhAz7siSWzlCg7OhYyupxhl2DWlKLBDXV9vxp3jd807ASXENtwiD31DJV3hL/ZKXosc7a d9bJxAknmEZA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="533495256" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga004.jf.intel.com with ESMTP; 16 Apr 2021 08:41:37 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 2964030E; Fri, 16 Apr 2021 18:41:50 +0300 (EEST) From: "Kirill A. Shutemov" To: Dave Hansen , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Jim Mattson Cc: David Rientjes , "Edgecombe, Rick P" , "Kleen, Andi" , "Yamahata, Isaku" , Erdem Aktas , Steve Rutherford , Peter Gonda , David Hildenbrand , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [RFCv2 09/13] shmem: Fail shmem_getpage_gfp() on poisoned pages Date: Fri, 16 Apr 2021 18:41:02 +0300 Message-Id: <20210416154106.23721-10-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> References: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Forbid access to poisoned pages. TODO: Probably more fine-grained approach is needed. It shuld be a allowed to fault-in these pages as hwpoison entries. Not-Signed-off-by: Kirill A. Shutemov --- mm/shmem.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/mm/shmem.c b/mm/shmem.c index 7c6b6d8f6c39..d29a0c9be19c 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1832,6 +1832,13 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, if (page) hindex = page->index; + + if (page && PageHWPoison(page)) { + unlock_page(page); + put_page(page); + return -EIO; + } + if (page && sgp == SGP_WRITE) mark_page_accessed(page); From patchwork Fri Apr 16 15:41:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 12208011 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C2F2C43461 for ; Fri, 16 Apr 2021 15:41:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3FAE6613B4 for ; Fri, 16 Apr 2021 15:41:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237334AbhDPPmV (ORCPT ); Fri, 16 Apr 2021 11:42:21 -0400 Received: from mga17.intel.com ([192.55.52.151]:10803 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343993AbhDPPmL (ORCPT ); Fri, 16 Apr 2021 11:42:11 -0400 IronPort-SDR: n94Lt51jOBkjwKnaxLRjWZf5zBWf0bQKIEMond6YyUHQWUshn8yykoqnre/dHhzze7+koInlWM gCrBE7ugeHVQ== X-IronPort-AV: E=McAfee;i="6200,9189,9956"; a="175163937" X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="175163937" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Apr 2021 08:41:43 -0700 IronPort-SDR: GV2YKh3YQQwBbOm00/MLd7uBYxj/MRkz8+eXvddmWNUVHa4PpCVjoCFf3t4suV3PJ2PGTLh9zq X73Pit5nhcWQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="422038785" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga007.jf.intel.com with ESMTP; 16 Apr 2021 08:41:38 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 35AC835A; Fri, 16 Apr 2021 18:41:50 +0300 (EEST) From: "Kirill A. Shutemov" To: Dave Hansen , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Jim Mattson Cc: David Rientjes , "Edgecombe, Rick P" , "Kleen, Andi" , "Yamahata, Isaku" , Erdem Aktas , Steve Rutherford , Peter Gonda , David Hildenbrand , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [RFCv2 10/13] mm: Keep page reference for hwpoison entries Date: Fri, 16 Apr 2021 18:41:03 +0300 Message-Id: <20210416154106.23721-11-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> References: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On migration we drop referece to a page once PTE gets replaced with a migration entry. Migration code keeps the last reference on the page that gets dropped once migration is complete. We do the same for hwpoison entries, but it doesn't work: nobody keeps the reference once page is remapped with hwpoison entries. Keep page referenced on setting up hwpoison entries, copy the reference on fork() and return on zap(). Signed-off-by: Kirill A. Shutemov --- mm/memory.c | 6 ++++++ mm/rmap.c | 2 +- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index feff48e1465a..b15b0c582186 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -767,6 +767,9 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, pte = pte_swp_mkuffd_wp(pte); set_pte_at(src_mm, addr, src_pte, pte); } + } else if (is_hwpoison_entry(entry)) { + page = hwpoison_entry_to_page(entry); + get_page(page); } set_pte_at(dst_mm, addr, dst_pte, pte); return 0; @@ -1305,6 +1308,9 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, page = migration_entry_to_page(entry); rss[mm_counter(page)]--; + + } else if (is_hwpoison_entry(entry)) { + put_page(hwpoison_entry_to_page(entry)); } if (unlikely(!free_swap_and_cache(entry))) print_bad_pte(vma, addr, ptent, NULL); diff --git a/mm/rmap.c b/mm/rmap.c index 08c56aaf72eb..f08d1fc28522 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1575,7 +1575,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, dec_mm_counter(mm, mm_counter(page)); set_pte_at(mm, address, pvmw.pte, pteval); } - + get_page(page); } else if (pte_unused(pteval) && !userfaultfd_armed(vma)) { /* * The guest indicated that the page content is of no From patchwork Fri Apr 16 15:41:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 12208009 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24227C43460 for ; Fri, 16 Apr 2021 15:41:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0C1E6613B4 for ; Fri, 16 Apr 2021 15:41:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236712AbhDPPmT (ORCPT ); Fri, 16 Apr 2021 11:42:19 -0400 Received: from mga17.intel.com ([192.55.52.151]:10800 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343992AbhDPPmL (ORCPT ); Fri, 16 Apr 2021 11:42:11 -0400 IronPort-SDR: wfem44kkoBDDVcbXMsVSG37pJir2Mo55HGtqbltnwwhHGatu/HRpw7VeaD4HppdxlhYfhaxR0H 4eVLQDpBwSHQ== X-IronPort-AV: E=McAfee;i="6200,9189,9956"; a="175163938" X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="175163938" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Apr 2021 08:41:43 -0700 IronPort-SDR: BiatwMA0y8qUcIW7/n3v5VFpxjtDSkQkoZb1k+sB1Z4cLkxVSZkRIORtiu9iZm6VdBUjhLkGV4 eMT6uG8NXbTQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="422038787" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga007.jf.intel.com with ESMTP; 16 Apr 2021 08:41:38 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 41BE135E; Fri, 16 Apr 2021 18:41:50 +0300 (EEST) From: "Kirill A. Shutemov" To: Dave Hansen , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Jim Mattson Cc: David Rientjes , "Edgecombe, Rick P" , "Kleen, Andi" , "Yamahata, Isaku" , Erdem Aktas , Steve Rutherford , Peter Gonda , David Hildenbrand , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [RFCv2 11/13] mm: Replace hwpoison entry with present PTE if page got unpoisoned Date: Fri, 16 Apr 2021 18:41:04 +0300 Message-Id: <20210416154106.23721-12-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> References: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org If the page got unpoisoned we can replace hwpoison entry with a present PTE on page fault instead of delivering SIGBUS. Signed-off-by: Kirill A. Shutemov --- mm/memory.c | 38 +++++++++++++++++++++++++++++++++++++- 1 file changed, 37 insertions(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index b15b0c582186..56f93e8e98f9 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3280,7 +3280,43 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) vmf->page = device_private_entry_to_page(entry); ret = vmf->page->pgmap->ops->migrate_to_ram(vmf); } else if (is_hwpoison_entry(entry)) { - ret = VM_FAULT_HWPOISON; + page = hwpoison_entry_to_page(entry); + + locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags); + if (!locked) { + ret = VM_FAULT_RETRY; + goto out; + } + + vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, + vmf->address, &vmf->ptl); + + if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) { + ret = 0; + } else if (PageHWPoison(page)) { + ret = VM_FAULT_HWPOISON; + } else { + /* + * The page is unpoisoned. Replace hwpoison + * entry with a present PTE. + */ + + inc_mm_counter(vma->vm_mm, mm_counter(page)); + pte = mk_pte(page, vma->vm_page_prot); + + if (PageAnon(page)) { + page_add_anon_rmap(page, vma, + vmf->address, false); + } else { + page_add_file_rmap(page, false); + } + + set_pte_at(vma->vm_mm, vmf->address, + vmf->pte, pte); + } + + pte_unmap_unlock(vmf->pte, vmf->ptl); + unlock_page(page); } else { print_bad_pte(vma, vmf->address, vmf->orig_pte, NULL); ret = VM_FAULT_SIGBUS; From patchwork Fri Apr 16 15:41:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 12208013 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CA7EC433B4 for ; Fri, 16 Apr 2021 15:41:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 20ED0613B4 for ; Fri, 16 Apr 2021 15:41:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237480AbhDPPmW (ORCPT ); Fri, 16 Apr 2021 11:42:22 -0400 Received: from mga06.intel.com ([134.134.136.31]:18952 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343979AbhDPPmQ (ORCPT ); Fri, 16 Apr 2021 11:42:16 -0400 IronPort-SDR: ozDk07V9sdvuAs92B9HnPL8tevharSccVgb9fFyKZ6EDbvhiCvVOiYK1wrbsRg4dMXpjmc3H9L UMr2K7jVGaDQ== X-IronPort-AV: E=McAfee;i="6200,9189,9956"; a="256371746" X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="256371746" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Apr 2021 08:41:51 -0700 IronPort-SDR: uZvFooXAZtP/DOt3rfOjQaJ5N9ruvvbTTjYYdv9o7baNrfOMWFIO2CCTVrIbUQKBl9rMlyUCyp FuEPe04gY7NA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="425635445" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga008.jf.intel.com with ESMTP; 16 Apr 2021 08:41:37 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 4EFA537E; Fri, 16 Apr 2021 18:41:50 +0300 (EEST) From: "Kirill A. Shutemov" To: Dave Hansen , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Jim Mattson Cc: David Rientjes , "Edgecombe, Rick P" , "Kleen, Andi" , "Yamahata, Isaku" , Erdem Aktas , Steve Rutherford , Peter Gonda , David Hildenbrand , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [RFCv2 12/13] KVM: passdown struct kvm to hva_to_pfn_slow() Date: Fri, 16 Apr 2021 18:41:05 +0300 Message-Id: <20210416154106.23721-13-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> References: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Make struct kvm pointer available within hva_to_pfn_slow(). It is prepartion for the next patch. Signed-off-by: Kirill A. Shutemov --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- arch/powerpc/kvm/book3s_64_mmu_radix.c | 2 +- arch/x86/kvm/mmu/mmu.c | 8 +++-- include/linux/kvm_host.h | 12 ++++--- virt/kvm/kvm_main.c | 49 ++++++++++++++------------ 5 files changed, 41 insertions(+), 32 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 38ea396a23d6..86781ff76fcb 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -589,7 +589,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu, write_ok = true; } else { /* Call KVM generic code to do the slow-path check */ - pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL, + pfn = __gfn_to_pfn_memslot(kvm, memslot, gfn, false, NULL, writing, &write_ok); if (is_error_noslot_pfn(pfn)) return -EFAULT; diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index bb35490400e9..319a1a99153f 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -821,7 +821,7 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu, unsigned long pfn; /* Call KVM generic code to do the slow-path check */ - pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL, + pfn = __gfn_to_pfn_memslot(kvm, memslot, gfn, false, NULL, writing, upgrade_p); if (is_error_noslot_pfn(pfn)) return -EFAULT; diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 6d16481aa29d..3b97342af026 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2687,7 +2687,7 @@ static kvm_pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, if (!slot) return KVM_PFN_ERR_FAULT; - return gfn_to_pfn_memslot_atomic(slot, gfn); + return gfn_to_pfn_memslot_atomic(vcpu->kvm, slot, gfn); } static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu, @@ -3672,7 +3672,8 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, } async = false; - *pfn = __gfn_to_pfn_memslot(slot, gfn, false, &async, write, writable); + *pfn = __gfn_to_pfn_memslot(vcpu->kvm, slot, gfn, false, + &async, write, writable); if (!async) return false; /* *pfn has correct page already */ @@ -3686,7 +3687,8 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, return true; } - *pfn = __gfn_to_pfn_memslot(slot, gfn, false, NULL, write, writable); + *pfn = __gfn_to_pfn_memslot(vcpu->kvm, slot, gfn, false, + NULL, write, writable); return false; } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index f3b1013fb22c..fadaccb95a4c 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -725,11 +725,13 @@ void kvm_set_page_accessed(struct page *page); kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn); kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, bool *writable); -kvm_pfn_t gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn); -kvm_pfn_t gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot, gfn_t gfn); -kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn, - bool atomic, bool *async, bool write_fault, - bool *writable); +kvm_pfn_t gfn_to_pfn_memslot(struct kvm *kvm, + struct kvm_memory_slot *slot, gfn_t gfn); +kvm_pfn_t gfn_to_pfn_memslot_atomic(struct kvm *kvm, + struct kvm_memory_slot *slot, gfn_t gfn); +kvm_pfn_t __gfn_to_pfn_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, + gfn_t gfn, bool atomic, bool *async, + bool write_fault, bool *writable); void kvm_release_pfn_clean(kvm_pfn_t pfn); void kvm_release_pfn_dirty(kvm_pfn_t pfn); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 8367d88ce39b..3e2dbaef7979 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2009,9 +2009,9 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, return pfn; } -kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn, - bool atomic, bool *async, bool write_fault, - bool *writable) +kvm_pfn_t __gfn_to_pfn_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, + gfn_t gfn, bool atomic, bool *async, + bool write_fault, bool *writable) { unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault); @@ -2041,38 +2041,43 @@ EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot); kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, bool *writable) { - return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, NULL, - write_fault, writable); + return __gfn_to_pfn_memslot(kvm, gfn_to_memslot(kvm, gfn), gfn, false, + NULL, write_fault, writable); } EXPORT_SYMBOL_GPL(gfn_to_pfn_prot); -kvm_pfn_t gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn) +kvm_pfn_t gfn_to_pfn_memslot(struct kvm *kvm, + struct kvm_memory_slot *slot, gfn_t gfn) { - return __gfn_to_pfn_memslot(slot, gfn, false, NULL, true, NULL); + return __gfn_to_pfn_memslot(kvm, slot, gfn, false, NULL, true, NULL); } EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot); -kvm_pfn_t gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot, gfn_t gfn) +kvm_pfn_t gfn_to_pfn_memslot_atomic(struct kvm *kvm, + struct kvm_memory_slot *slot, gfn_t gfn) { - return __gfn_to_pfn_memslot(slot, gfn, true, NULL, true, NULL); + return __gfn_to_pfn_memslot(kvm, slot, gfn, true, NULL, true, NULL); } EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic); kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn) { - return gfn_to_pfn_memslot_atomic(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn); + return gfn_to_pfn_memslot_atomic(vcpu->kvm, + kvm_vcpu_gfn_to_memslot(vcpu, gfn), + gfn); } EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_pfn_atomic); kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn) { - return gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn); + return gfn_to_pfn_memslot(kvm, gfn_to_memslot(kvm, gfn), gfn); } EXPORT_SYMBOL_GPL(gfn_to_pfn); kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn) { - return gfn_to_pfn_memslot(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn); + return gfn_to_pfn_memslot(vcpu->kvm, + kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn); } EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_pfn); @@ -2130,21 +2135,21 @@ void kvm_release_pfn(kvm_pfn_t pfn, bool dirty, struct gfn_to_pfn_cache *cache) kvm_release_pfn_clean(pfn); } -static void kvm_cache_gfn_to_pfn(struct kvm_memory_slot *slot, gfn_t gfn, +static void kvm_cache_gfn_to_pfn(struct kvm *kvm, + struct kvm_memory_slot *slot, gfn_t gfn, struct gfn_to_pfn_cache *cache, u64 gen) { kvm_release_pfn(cache->pfn, cache->dirty, cache); - cache->pfn = gfn_to_pfn_memslot(slot, gfn); + cache->pfn = gfn_to_pfn_memslot(kvm, slot, gfn); cache->gfn = gfn; cache->dirty = false; cache->generation = gen; } -static int __kvm_map_gfn(struct kvm_memslots *slots, gfn_t gfn, - struct kvm_host_map *map, - struct gfn_to_pfn_cache *cache, - bool atomic) +static int __kvm_map_gfn(struct kvm *kvm, struct kvm_memslots *slots, + gfn_t gfn, struct kvm_host_map *map, + struct gfn_to_pfn_cache *cache, bool atomic) { kvm_pfn_t pfn; void *hva = NULL; @@ -2160,13 +2165,13 @@ static int __kvm_map_gfn(struct kvm_memslots *slots, gfn_t gfn, cache->generation != gen) { if (atomic) return -EAGAIN; - kvm_cache_gfn_to_pfn(slot, gfn, cache, gen); + kvm_cache_gfn_to_pfn(kvm, slot, gfn, cache, gen); } pfn = cache->pfn; } else { if (atomic) return -EAGAIN; - pfn = gfn_to_pfn_memslot(slot, gfn); + pfn = gfn_to_pfn_memslot(kvm, slot, gfn); } if (is_error_noslot_pfn(pfn)) return -EINVAL; @@ -2199,14 +2204,14 @@ static int __kvm_map_gfn(struct kvm_memslots *slots, gfn_t gfn, int kvm_map_gfn(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map, struct gfn_to_pfn_cache *cache, bool atomic) { - return __kvm_map_gfn(kvm_memslots(vcpu->kvm), gfn, map, + return __kvm_map_gfn(vcpu->kvm, kvm_memslots(vcpu->kvm), gfn, map, cache, atomic); } EXPORT_SYMBOL_GPL(kvm_map_gfn); int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map) { - return __kvm_map_gfn(kvm_vcpu_memslots(vcpu), gfn, map, + return __kvm_map_gfn(vcpu->kvm, kvm_vcpu_memslots(vcpu), gfn, map, NULL, false); } EXPORT_SYMBOL_GPL(kvm_vcpu_map); From patchwork Fri Apr 16 15:41:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 12208015 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2401BC43462 for ; Fri, 16 Apr 2021 15:42:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 093156121F for ; Fri, 16 Apr 2021 15:42:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344104AbhDPPmX (ORCPT ); Fri, 16 Apr 2021 11:42:23 -0400 Received: from mga09.intel.com ([134.134.136.24]:4092 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344009AbhDPPmS (ORCPT ); Fri, 16 Apr 2021 11:42:18 -0400 IronPort-SDR: ys/ErzOvSPh31eGEN6I+iY/Zeoje/pxRuMWX9rdVO037hlrdbFAaCsrz612T5oiqPZ0ZJTriWM 22DtxA4PEgjA== X-IronPort-AV: E=McAfee;i="6200,9189,9956"; a="195169170" X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="195169170" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Apr 2021 08:41:53 -0700 IronPort-SDR: 1PONy81YMcZYKwhg3gFvr/r9D7/FGF9ROpTuRsjtnW2sVq/T3fT9LLPWXFndWTfBDjYnn8O62R mTk2PTcSoT0g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="419149007" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga008.fm.intel.com with ESMTP; 16 Apr 2021 08:41:37 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 5BA5A388; Fri, 16 Apr 2021 18:41:50 +0300 (EEST) From: "Kirill A. Shutemov" To: Dave Hansen , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Jim Mattson Cc: David Rientjes , "Edgecombe, Rick P" , "Kleen, Andi" , "Yamahata, Isaku" , Erdem Aktas , Steve Rutherford , Peter Gonda , David Hildenbrand , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [RFCv2 13/13] KVM: unmap guest memory using poisoned pages Date: Fri, 16 Apr 2021 18:41:06 +0300 Message-Id: <20210416154106.23721-14-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> References: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org TDX architecture aims to provide resiliency against confidentiality and integrity attacks. Towards this goal, the TDX architecture helps enforce the enabling of memory integrity for all TD-private memory. The CPU memory controller computes the integrity check value (MAC) for the data (cache line) during writes, and it stores the MAC with the memory as meta-data. A 28-bit MAC is stored in the ECC bits. Checking of memory integrity is performed during memory reads. If integrity check fails, CPU poisones cache line. On a subsequent consumption (read) of the poisoned data by software, there are two possible scenarios: - Core determines that the execution can continue and it treats poison with exception semantics signaled as a #MCE - Core determines execution cannot continue,and it does an unbreakable shutdown For more details, see Chapter 14 of Intel TDX Module EAS[1] As some of integrity check failures may lead to system shutdown host kernel must not allow any writes to TD-private memory. This requirment clashes with KVM design: KVM expects the guest memory to be mapped into host userspace (e.g. QEMU). This patch aims to start discussion on how we can approach the issue. For now I intentionally keep TDX out of picture here and try to find a generic way to unmap KVM guest memory from host userspace. Hopefully, it makes the patch more approachable. And anyone can try it out. To the proposal: Looking into existing codepaths I've discovered that we already have semantics we want. That's PG_hwpoison'ed pages and SWP_HWPOISON swap entries in page tables: - If an application touches a page mapped with the SWP_HWPOISON, it will get SIGBUS. - GUP will fail with -EFAULT; Access the poisoned memory via page cache doesn't match required semantics right now, but it shouldn't be too hard to make it work: access to poisoned dirty pages should give -EIO or -EHWPOISON. My idea is that we can mark page as poisoned when we make it TD-private and replace all PTEs that map the page with SWP_HWPOISON. TODO: THP support is missing. [1] https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-module-1eas.pdf Not-signed-off-by: Kirill A. Shutemov Signed-off-by: Kirill A. Shutemov Signed-off-by: Kirill A. Shutemov --- arch/x86/kvm/Kconfig | 1 + arch/x86/kvm/cpuid.c | 3 +- arch/x86/kvm/mmu/mmu.c | 12 ++- arch/x86/kvm/mmu/paging_tmpl.h | 10 +- arch/x86/kvm/x86.c | 6 ++ include/linux/kvm_host.h | 19 ++++ include/uapi/linux/kvm_para.h | 1 + mm/page_alloc.c | 7 ++ virt/Makefile | 2 +- virt/kvm/Kconfig | 4 + virt/kvm/Makefile | 1 + virt/kvm/kvm_main.c | 163 ++++++++++++++++++++++++++++----- virt/kvm/mem_protected.c | 110 ++++++++++++++++++++++ 13 files changed, 311 insertions(+), 28 deletions(-) create mode 100644 virt/kvm/Makefile create mode 100644 virt/kvm/mem_protected.c diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 7ac592664c52..b7db1c455e7c 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -46,6 +46,7 @@ config KVM select KVM_GENERIC_DIRTYLOG_READ_PROTECT select KVM_VFIO select SRCU + select HAVE_KVM_PROTECTED_MEMORY help Support hosting fully virtualized guest machines using hardware virtualization extensions. You will need a fairly recent diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 38172ca627d3..1457692c1080 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -796,7 +796,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) (1 << KVM_FEATURE_PV_SEND_IPI) | (1 << KVM_FEATURE_POLL_CONTROL) | (1 << KVM_FEATURE_PV_SCHED_YIELD) | - (1 << KVM_FEATURE_ASYNC_PF_INT); + (1 << KVM_FEATURE_ASYNC_PF_INT) | + (1 << KVM_FEATURE_MEM_PROTECTED); if (sched_info_on()) entry->eax |= (1 << KVM_FEATURE_STEAL_TIME); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 3b97342af026..3fa76693edcd 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -43,6 +43,7 @@ #include #include #include +#include #include #include @@ -2758,7 +2759,8 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep) if (sp->role.level > PG_LEVEL_4K) return; - __direct_pte_prefetch(vcpu, sp, sptep); + if (!vcpu->kvm->mem_protected) + __direct_pte_prefetch(vcpu, sp, sptep); } static int host_pfn_mapping_level(struct kvm_vcpu *vcpu, gfn_t gfn, @@ -3725,6 +3727,14 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r)) return r; + if (vcpu->kvm->mem_protected && unlikely(!is_noslot_pfn(pfn))) { + if (!kvm_protect_pfn(vcpu->kvm, gfn, pfn)) { + unsigned long hva = kvm_vcpu_gfn_to_hva(vcpu, gfn); + kvm_send_hwpoison_signal(hva, current); + return RET_PF_RETRY; + } + } + r = RET_PF_RETRY; spin_lock(&vcpu->kvm->mmu_lock); if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 50e268eb8e1a..26b0494a1207 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -397,8 +397,14 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, goto error; ptep_user = (pt_element_t __user *)((void *)host_addr + offset); - if (unlikely(__get_user(pte, ptep_user))) - goto error; + if (vcpu->kvm->mem_protected) { + if (copy_from_guest(vcpu->kvm, &pte, host_addr + offset, + sizeof(pte))) + goto error; + } else { + if (unlikely(__get_user(pte, ptep_user))) + goto error; + } walker->ptep_user[walker->level - 1] = ptep_user; trace_kvm_mmu_paging_element(pte, walker->level); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1b404e4d7dd8..f8183386abe7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8170,6 +8170,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) kvm_sched_yield(vcpu->kvm, a0); ret = 0; break; + case KVM_HC_ENABLE_MEM_PROTECTED: + ret = kvm_protect_memory(vcpu->kvm); + break; + case KVM_HC_MEM_SHARE: + ret = kvm_share_memory(vcpu->kvm, a0, a1); + break; default: ret = -KVM_ENOSYS; break; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index fadaccb95a4c..cd2374802702 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -436,6 +436,8 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu) } #endif +#define KVM_NR_SHARED_RANGES 32 + /* * Note: * memslots are not sorted by id anymore, please use id_to_memslot() @@ -513,6 +515,10 @@ struct kvm { pid_t userspace_pid; unsigned int max_halt_poll_ns; u32 dirty_ring_size; + bool mem_protected; + void *id; + int nr_shared_ranges; + struct range shared_ranges[KVM_NR_SHARED_RANGES]; }; #define kvm_err(fmt, ...) \ @@ -709,6 +715,16 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm); void kvm_arch_flush_shadow_memslot(struct kvm *kvm, struct kvm_memory_slot *slot); +int kvm_protect_memory(struct kvm *kvm); + +void __kvm_share_memory(struct kvm *kvm, unsigned long start, unsigned long end); +int kvm_share_memory(struct kvm *kvm, unsigned long gfn, unsigned long npages); + +bool kvm_protect_pfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn); +void kvm_share_pfn(struct kvm *kvm, kvm_pfn_t pfn); + +bool kvm_page_allowed(struct kvm *kvm, struct page *page); + int gfn_to_page_many_atomic(struct kvm_memory_slot *slot, gfn_t gfn, struct page **pages, int nr_pages); @@ -718,6 +734,9 @@ unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool *writable); unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn); unsigned long gfn_to_hva_memslot_prot(struct kvm_memory_slot *slot, gfn_t gfn, bool *writable); +int copy_from_guest(struct kvm *kvm, void *data, unsigned long hva, int len); +int copy_to_guest(struct kvm *kvm, unsigned long hva, const void *data, int len); + void kvm_release_page_clean(struct page *page); void kvm_release_page_dirty(struct page *page); void kvm_set_page_accessed(struct page *page); diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h index 09d36683ee0a..743e621111f0 100644 --- a/include/uapi/linux/kvm_para.h +++ b/include/uapi/linux/kvm_para.h @@ -17,6 +17,7 @@ #define KVM_E2BIG E2BIG #define KVM_EPERM EPERM #define KVM_EOPNOTSUPP 95 +#define KVM_EINTR EINTR #define KVM_HC_VAPIC_POLL_IRQ 1 #define KVM_HC_MMU_OP 2 diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 519a60d5b6f7..5d05129e8b0c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1224,6 +1224,13 @@ static __always_inline bool free_pages_prepare(struct page *page, trace_mm_page_free(page, order); + if (IS_ENABLED(CONFIG_HAVE_KVM_PROTECTED_MEMORY) && + unlikely(PageHWPoison(page))) { + void kvm_unpoison_page(struct page *page); + + kvm_unpoison_page(page); + } + if (unlikely(PageHWPoison(page)) && !order) { /* * Do not let hwpoison pages hit pcplists/buddy diff --git a/virt/Makefile b/virt/Makefile index 1cfea9436af9..a931049086a3 100644 --- a/virt/Makefile +++ b/virt/Makefile @@ -1,2 +1,2 @@ # SPDX-License-Identifier: GPL-2.0-only -obj-y += lib/ +obj-y += lib/ kvm/ diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index 1c37ccd5d402..5f28048718f4 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -63,3 +63,7 @@ config HAVE_KVM_NO_POLL config KVM_XFER_TO_GUEST_WORK bool + +config HAVE_KVM_PROTECTED_MEMORY + bool + select MEMORY_FAILURE diff --git a/virt/kvm/Makefile b/virt/kvm/Makefile new file mode 100644 index 000000000000..f10c62b98968 --- /dev/null +++ b/virt/kvm/Makefile @@ -0,0 +1 @@ +obj-$(CONFIG_HAVE_KVM_PROTECTED_MEMORY) += mem_protected.o diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 3e2dbaef7979..9cbd3716b3e1 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -51,6 +51,7 @@ #include #include #include +#include #include #include @@ -741,6 +742,7 @@ static struct kvm *kvm_create_vm(unsigned long type) struct kvm *kvm = kvm_arch_alloc_vm(); int r = -ENOMEM; int i; + static long id = 0; if (!kvm) return ERR_PTR(-ENOMEM); @@ -803,6 +805,9 @@ static struct kvm *kvm_create_vm(unsigned long type) mutex_lock(&kvm_lock); list_add(&kvm->vm_list, &vm_list); + kvm->id = xa_mk_value(id++); + if (id < 0) + id = 0; mutex_unlock(&kvm_lock); preempt_notifier_inc(); @@ -1852,7 +1857,8 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault, * The slow path to get the pfn of the specified host virtual address, * 1 indicates success, -errno is returned if error is detected. */ -static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault, +static int hva_to_pfn_slow(struct kvm *kvm, unsigned long addr, + bool *async, bool write_fault, bool *writable, kvm_pfn_t *pfn) { unsigned int flags = FOLL_HWPOISON; @@ -1868,11 +1874,17 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault, flags |= FOLL_WRITE; if (async) flags |= FOLL_NOWAIT; + if (kvm->mem_protected) + flags |= FOLL_ALLOW_POISONED; npages = get_user_pages_unlocked(addr, 1, &page, flags); if (npages != 1) return npages; + if (IS_ENABLED(CONFIG_HAVE_KVM_PROTECTED_MEMORY) && + kvm->mem_protected && !kvm_page_allowed(kvm, page)) + return -EHWPOISON; + /* map read fault as writable if possible */ if (unlikely(!write_fault) && writable) { struct page *wpage; @@ -1961,8 +1973,9 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, * 2): @write_fault = false && @writable, @writable will tell the caller * whether the mapping is writable. */ -static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, - bool write_fault, bool *writable) +static kvm_pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr, + bool atomic, bool *async, + bool write_fault, bool *writable) { struct vm_area_struct *vma; kvm_pfn_t pfn = 0; @@ -1977,7 +1990,7 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, if (atomic) return KVM_PFN_ERR_FAULT; - npages = hva_to_pfn_slow(addr, async, write_fault, writable, &pfn); + npages = hva_to_pfn_slow(kvm, addr, async, write_fault, writable, &pfn); if (npages == 1) return pfn; @@ -2033,8 +2046,7 @@ kvm_pfn_t __gfn_to_pfn_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, writable = NULL; } - return hva_to_pfn(addr, atomic, async, write_fault, - writable); + return hva_to_pfn(kvm, addr, atomic, async, write_fault, writable); } EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot); @@ -2338,19 +2350,93 @@ static int next_segment(unsigned long len, int offset) return len; } -static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn, - void *data, int offset, int len) +int copy_from_guest(struct kvm *kvm, void *data, unsigned long hva, int len) +{ + int offset = offset_in_page(hva); + struct page *page; + int npages, seg; + void *vaddr; + + if (!IS_ENABLED(CONFIG_HAVE_KVM_PROTECTED_MEMORY) || + !kvm->mem_protected) { + return __copy_from_user(data, (void __user *)hva, len); + } + + might_fault(); + kasan_check_write(data, len); + check_object_size(data, len, false); + + while ((seg = next_segment(len, offset)) != 0) { + npages = get_user_pages_unlocked(hva, 1, &page, + FOLL_ALLOW_POISONED); + if (npages != 1) + return -EFAULT; + + if (!kvm_page_allowed(kvm, page)) + return -EFAULT; + + vaddr = kmap_atomic(page); + memcpy(data, vaddr + offset, seg); + kunmap_atomic(vaddr); + + put_page(page); + len -= seg; + hva += seg; + data += seg; + offset = 0; + } + + return 0; +} + +int copy_to_guest(struct kvm *kvm, unsigned long hva, const void *data, int len) +{ + int offset = offset_in_page(hva); + struct page *page; + int npages, seg; + void *vaddr; + + if (!IS_ENABLED(CONFIG_HAVE_KVM_PROTECTED_MEMORY) || + !kvm->mem_protected) { + return __copy_to_user((void __user *)hva, data, len); + } + + might_fault(); + kasan_check_read(data, len); + check_object_size(data, len, true); + + while ((seg = next_segment(len, offset)) != 0) { + npages = get_user_pages_unlocked(hva, 1, &page, + FOLL_WRITE | FOLL_ALLOW_POISONED); + if (npages != 1) + return -EFAULT; + + if (!kvm_page_allowed(kvm, page)) + return -EFAULT; + + vaddr = kmap_atomic(page); + memcpy(vaddr + offset, data, seg); + kunmap_atomic(vaddr); + + put_page(page); + len -= seg; + hva += seg; + data += seg; + offset = 0; + } + + return 0; +} + +static int __kvm_read_guest_page(struct kvm *kvm, struct kvm_memory_slot *slot, + gfn_t gfn, void *data, int offset, int len) { - int r; unsigned long addr; addr = gfn_to_hva_memslot_prot(slot, gfn, NULL); if (kvm_is_error_hva(addr)) return -EFAULT; - r = __copy_from_user(data, (void __user *)addr + offset, len); - if (r) - return -EFAULT; - return 0; + return copy_from_guest(kvm, data, addr + offset, len); } int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset, @@ -2358,7 +2444,7 @@ int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset, { struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn); - return __kvm_read_guest_page(slot, gfn, data, offset, len); + return __kvm_read_guest_page(kvm, slot, gfn, data, offset, len); } EXPORT_SYMBOL_GPL(kvm_read_guest_page); @@ -2367,7 +2453,7 @@ int kvm_vcpu_read_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, void *data, { struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); - return __kvm_read_guest_page(slot, gfn, data, offset, len); + return __kvm_read_guest_page(vcpu->kvm, slot, gfn, data, offset, len); } EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_page); @@ -2449,7 +2535,8 @@ static int __kvm_write_guest_page(struct kvm *kvm, addr = gfn_to_hva_memslot(memslot, gfn); if (kvm_is_error_hva(addr)) return -EFAULT; - r = __copy_to_user((void __user *)addr + offset, data, len); + + r = copy_to_guest(kvm, addr + offset, data, len); if (r) return -EFAULT; mark_page_dirty_in_slot(kvm, memslot, gfn); @@ -2586,7 +2673,7 @@ int kvm_write_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, if (unlikely(!ghc->memslot)) return kvm_write_guest(kvm, gpa, data, len); - r = __copy_to_user((void __user *)ghc->hva + offset, data, len); + r = copy_to_guest(kvm, ghc->hva + offset, data, len); if (r) return -EFAULT; mark_page_dirty_in_slot(kvm, ghc->memslot, gpa >> PAGE_SHIFT); @@ -2607,7 +2694,6 @@ int kvm_read_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, unsigned long len) { struct kvm_memslots *slots = kvm_memslots(kvm); - int r; gpa_t gpa = ghc->gpa + offset; BUG_ON(len + offset > ghc->len); @@ -2623,11 +2709,7 @@ int kvm_read_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, if (unlikely(!ghc->memslot)) return kvm_read_guest(kvm, gpa, data, len); - r = __copy_from_user(data, (void __user *)ghc->hva + offset, len); - if (r) - return -EFAULT; - - return 0; + return copy_from_guest(kvm, data, ghc->hva + offset, len); } EXPORT_SYMBOL_GPL(kvm_read_guest_offset_cached); @@ -2693,6 +2775,41 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn) } EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty); +int kvm_protect_memory(struct kvm *kvm) +{ + if (mmap_write_lock_killable(kvm->mm)) + return -KVM_EINTR; + + kvm->mem_protected = true; + kvm_arch_flush_shadow_all(kvm); + mmap_write_unlock(kvm->mm); + + return 0; +} + +int kvm_share_memory(struct kvm *kvm, unsigned long gfn, unsigned long npages) +{ + unsigned long end = gfn + npages; + + if (!npages || !IS_ENABLED(CONFIG_HAVE_KVM_PROTECTED_MEMORY)) + return 0; + + __kvm_share_memory(kvm, gfn, end); + + for (; gfn < end; gfn++) { + struct page *page = gfn_to_page(kvm, gfn); + unsigned long pfn = page_to_pfn(page); + + if (page == KVM_ERR_PTR_BAD_PAGE) + continue; + + kvm_share_pfn(kvm, pfn); + put_page(page); + } + + return 0; +} + void kvm_sigset_activate(struct kvm_vcpu *vcpu) { if (!vcpu->sigset_active) diff --git a/virt/kvm/mem_protected.c b/virt/kvm/mem_protected.c new file mode 100644 index 000000000000..81882bd3232b --- /dev/null +++ b/virt/kvm/mem_protected.c @@ -0,0 +1,110 @@ +#include +#include +#include + +static DEFINE_XARRAY(kvm_pfn_map); + +static bool gfn_is_shared(struct kvm *kvm, unsigned long gfn) +{ + bool ret = false; + int i; + + spin_lock(&kvm->mmu_lock); + for (i = 0; i < kvm->nr_shared_ranges; i++) { + if (gfn < kvm->shared_ranges[i].start) + continue; + if (gfn >= kvm->shared_ranges[i].end) + continue; + + ret = true; + break; + } + spin_unlock(&kvm->mmu_lock); + + return ret; +} + +bool kvm_protect_pfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn) +{ + struct page *page = pfn_to_page(pfn); + bool ret = true; + + if (gfn_is_shared(kvm, gfn)) + return true; + + if (is_zero_pfn(pfn)) + return true; + + /* Only anonymous and shmem/tmpfs pages are supported */ + if (!PageSwapBacked(page)) + return false; + + lock_page(page); + + /* Recheck gfn_is_shared() under page lock */ + if (gfn_is_shared(kvm, gfn)) + goto out; + + if (!TestSetPageHWPoison(page)) { + try_to_unmap(page, TTU_IGNORE_MLOCK); + xa_store(&kvm_pfn_map, pfn, kvm->id, GFP_KERNEL); + } else if (xa_load(&kvm_pfn_map, pfn) != kvm->id) { + ret = false; + } +out: + unlock_page(page); + return ret; +} +EXPORT_SYMBOL_GPL(kvm_protect_pfn); + +void __kvm_share_memory(struct kvm *kvm, + unsigned long start, unsigned long end) +{ + /* + * Out of slots. + * Still worth to proceed: the new range may merge with an existing + * one. + */ + WARN_ON_ONCE(kvm->nr_shared_ranges == ARRAY_SIZE(kvm->shared_ranges)); + + spin_lock(&kvm->mmu_lock); + kvm->nr_shared_ranges = add_range_with_merge(kvm->shared_ranges, + ARRAY_SIZE(kvm->shared_ranges), + kvm->nr_shared_ranges, + start, end); + kvm->nr_shared_ranges = clean_sort_range(kvm->shared_ranges, + ARRAY_SIZE(kvm->shared_ranges)); + spin_unlock(&kvm->mmu_lock); +} +EXPORT_SYMBOL(__kvm_share_memory); + +void kvm_share_pfn(struct kvm *kvm, kvm_pfn_t pfn) +{ + struct page *page = pfn_to_page(pfn); + + lock_page(page); + if (xa_load(&kvm_pfn_map, pfn) == kvm->id) { + xa_erase(&kvm_pfn_map, pfn); + ClearPageHWPoison(page); + } + unlock_page(page); +} +EXPORT_SYMBOL(kvm_share_pfn); + +void kvm_unpoison_page(struct page *page) +{ + unsigned long pfn = page_to_pfn(page); + + if (xa_erase(&kvm_pfn_map, pfn)) + ClearPageHWPoison(page); +} + +bool kvm_page_allowed(struct kvm *kvm, struct page *page) +{ + unsigned long pfn = page_to_pfn(page); + + if (!PageHWPoison(page)) + return true; + + return xa_load(&kvm_pfn_map, pfn) == kvm->id; +}