From patchwork Thu Oct 3 21:23:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 11173453 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0B7D815AB for ; Thu, 3 Oct 2019 21:40:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E838E215EA for ; Thu, 3 Oct 2019 21:40:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388735AbfJCVkA (ORCPT ); Thu, 3 Oct 2019 17:40:00 -0400 Received: from mga09.intel.com ([134.134.136.24]:52653 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387926AbfJCVi7 (ORCPT ); Thu, 3 Oct 2019 17:38:59 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Oct 2019 14:38:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.67,253,1566889200"; d="scan'208";a="186051615" Received: from linksys13920.jf.intel.com (HELO rpedgeco-DESK5.jf.intel.com) ([10.54.75.11]) by orsmga008.jf.intel.com with ESMTP; 03 Oct 2019 14:38:57 -0700 From: Rick Edgecombe To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-mm@kvack.org, luto@kernel.org, peterz@infradead.org, dave.hansen@intel.com, pbonzini@redhat.com, sean.j.christopherson@intel.com, keescook@chromium.org Cc: kristen@linux.intel.com, deneen.t.dock@intel.com, Rick Edgecombe Subject: [RFC PATCH 01/13] kvm: Enable MTRR to work with GFNs with perm bits Date: Thu, 3 Oct 2019 14:23:48 -0700 Message-Id: <20191003212400.31130-2-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191003212400.31130-1-rick.p.edgecombe@intel.com> References: <20191003212400.31130-1-rick.p.edgecombe@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Mask gfn by maxphyaddr in kvm_mtrr_get_guest_memory_type so that the guests view of gfn is used when high bits of the physical memory are used as extra permissions bits. This supports the KVM XO feature. TODO: Since MTRR is emulated using EPT permissions, the XO version of the gpa range will not inherrit the MTRR type with this implementation. There shouldn't be any legacy use of KVM XO, but hypothetically it could interfere with the uncacheable MTRR type. Signed-off-by: Rick Edgecombe --- arch/x86/kvm/mtrr.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c index 25ce3edd1872..da38f3b83e51 100644 --- a/arch/x86/kvm/mtrr.c +++ b/arch/x86/kvm/mtrr.c @@ -621,6 +621,14 @@ u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn) const int wt_wb_mask = (1 << MTRR_TYPE_WRBACK) | (1 << MTRR_TYPE_WRTHROUGH); + /* + * Handle situations where gfn bits are used as permissions bits by + * masking KVMs view of the gfn with the guests physical address bits + * in order to match the guests view of physical address. For normal + * situations this will have no effect. + */ + gfn &= (1ULL << (cpuid_maxphyaddr(vcpu) - PAGE_SHIFT)); + start = gfn_to_gpa(gfn); end = start + PAGE_SIZE; From patchwork Thu Oct 3 21:23:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 11173405 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D041D1902 for ; Thu, 3 Oct 2019 21:39:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B013921D71 for ; Thu, 3 Oct 2019 21:39:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388135AbfJCVjA (ORCPT ); Thu, 3 Oct 2019 17:39:00 -0400 Received: from mga09.intel.com ([134.134.136.24]:52653 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387933AbfJCVjA (ORCPT ); Thu, 3 Oct 2019 17:39:00 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Oct 2019 14:38:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.67,253,1566889200"; d="scan'208";a="186051618" Received: from linksys13920.jf.intel.com (HELO rpedgeco-DESK5.jf.intel.com) ([10.54.75.11]) by orsmga008.jf.intel.com with ESMTP; 03 Oct 2019 14:38:57 -0700 From: Rick Edgecombe To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-mm@kvack.org, luto@kernel.org, peterz@infradead.org, dave.hansen@intel.com, pbonzini@redhat.com, sean.j.christopherson@intel.com, keescook@chromium.org Cc: kristen@linux.intel.com, deneen.t.dock@intel.com, Rick Edgecombe Subject: [RFC PATCH 02/13] kvm: Add support for X86_FEATURE_KVM_XO Date: Thu, 3 Oct 2019 14:23:49 -0700 Message-Id: <20191003212400.31130-3-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191003212400.31130-1-rick.p.edgecombe@intel.com> References: <20191003212400.31130-1-rick.p.edgecombe@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add X86_FEATURE_KVM_XO which reduces the physical address bits exposed by CPUID and uses the hosts highest physical address bit as an XO/NR permission bit in the guest page tables. Adjust reserved mask so KVM guest page tables walks are aware this bit is not reserved. Signed-off-by: Rick Edgecombe --- arch/x86/include/asm/cpufeature.h | 1 + arch/x86/include/asm/cpufeatures.h | 3 +++ arch/x86/include/uapi/asm/kvm_para.h | 3 +++ arch/x86/kvm/cpuid.c | 7 +++++++ arch/x86/kvm/cpuid.h | 1 + arch/x86/kvm/mmu.c | 18 ++++++++++++------ 6 files changed, 27 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index 58acda503817..17127ffbc2a2 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -30,6 +30,7 @@ enum cpuid_leafs CPUID_7_ECX, CPUID_8000_0007_EBX, CPUID_7_EDX, + CPUID_4000_0030_EAX }; #ifdef CONFIG_X86_FEATURE_NAMES diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index e880f2408e29..7ba217e894ea 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -364,6 +364,9 @@ #define X86_FEATURE_ARCH_CAPABILITIES (18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */ #define X86_FEATURE_SPEC_CTRL_SSBD (18*32+31) /* "" Speculative Store Bypass Disable */ +/* KVM-defined CPU features, CPUID level 0x40000030 (EAX), word 19 */ +#define X86_FEATURE_KVM_XO (19*32+0) /* KVM EPT based execute only memory support */ + /* * BUG word(s) */ diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 2a8e0b6b9805..ecff0ff25cf4 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -34,6 +34,9 @@ #define KVM_HINTS_REALTIME 0 +#define KVM_CPUID_FEAT_GENERIC 0x40000030 +#define KVM_FEATURE_GENERIC_XO 0 + /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. */ diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 22c2720cd948..bcbf3f93602d 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -700,6 +700,12 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function, if (sched_info_on()) entry->eax |= (1 << KVM_FEATURE_STEAL_TIME); + entry->ebx = 0; + entry->ecx = 0; + entry->edx = 0; + break; + case KVM_CPUID_FEAT_GENERIC: + entry->eax = (1 << KVM_FEATURE_GENERIC_XO); entry->ebx = 0; entry->ecx = 0; entry->edx = 0; @@ -845,6 +851,7 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid, { .func = 0x80000000 }, { .func = 0xC0000000, .qualifier = is_centaur_cpu }, { .func = KVM_CPUID_SIGNATURE }, + { .func = KVM_CPUID_FEAT_GENERIC }, }; if (cpuid->nent < 1) diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index d78a61408243..c36d462a0e01 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -53,6 +53,7 @@ static const struct cpuid_reg reverse_cpuid[] = { [CPUID_7_ECX] = { 7, 0, CPUID_ECX}, [CPUID_8000_0007_EBX] = {0x80000007, 0, CPUID_EBX}, [CPUID_7_EDX] = { 7, 0, CPUID_EDX}, + [CPUID_4000_0030_EAX] = {0x40000030, 0, CPUID_EAX}, }; static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index a63964e7cec7..e44a8053af78 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -4358,12 +4358,15 @@ static void __reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, struct rsvd_bits_validate *rsvd_check, int maxphyaddr, int level, bool nx, bool gbpages, - bool pse, bool amd) + bool pse, bool amd, bool xo) { u64 exb_bit_rsvd = 0; u64 gbpages_bit_rsvd = 0; u64 nonleaf_bit8_rsvd = 0; + /* Adjust maxphyaddr to include the XO bit if in use */ + maxphyaddr += xo; + rsvd_check->bad_mt_xwr = 0; if (!nx) @@ -4448,10 +4451,12 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context) { __reset_rsvds_bits_mask(vcpu, &context->guest_rsvd_check, - cpuid_maxphyaddr(vcpu), context->root_level, + cpuid_maxphyaddr(vcpu), + context->root_level, context->nx, guest_cpuid_has(vcpu, X86_FEATURE_GBPAGES), - is_pse(vcpu), guest_cpuid_is_amd(vcpu)); + is_pse(vcpu), guest_cpuid_is_amd(vcpu), + guest_cpuid_has(vcpu, X86_FEATURE_KVM_XO)); } static void @@ -4520,7 +4525,7 @@ reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context) shadow_phys_bits, context->shadow_root_level, uses_nx, guest_cpuid_has(vcpu, X86_FEATURE_GBPAGES), - is_pse(vcpu), true); + is_pse(vcpu), true, false); if (!shadow_me_mask) return; @@ -4557,7 +4562,7 @@ reset_tdp_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, shadow_phys_bits, context->shadow_root_level, false, boot_cpu_has(X86_FEATURE_GBPAGES), - true, true); + true, true, false); else __reset_rsvds_bits_mask_ept(shadow_zero_check, shadow_phys_bits, @@ -4818,7 +4823,8 @@ static union kvm_mmu_extended_role kvm_calc_mmu_role_ext(struct kvm_vcpu *vcpu) ext.cr4_pse = !!is_pse(vcpu); ext.cr4_pke = !!kvm_read_cr4_bits(vcpu, X86_CR4_PKE); ext.cr4_la57 = !!kvm_read_cr4_bits(vcpu, X86_CR4_LA57); - ext.maxphyaddr = cpuid_maxphyaddr(vcpu); + ext.maxphyaddr = cpuid_maxphyaddr(vcpu) + + guest_cpuid_has(vcpu, X86_FEATURE_KVM_XO); ext.valid = 1; From patchwork Thu Oct 3 21:23:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 11173451 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 42F6015AB for ; Thu, 3 Oct 2019 21:39:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2A4E321783 for ; Thu, 3 Oct 2019 21:39:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389046AbfJCVjx (ORCPT ); Thu, 3 Oct 2019 17:39:53 -0400 Received: from mga09.intel.com ([134.134.136.24]:52651 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730739AbfJCVi7 (ORCPT ); Thu, 3 Oct 2019 17:38:59 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Oct 2019 14:38:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.67,253,1566889200"; d="scan'208";a="186051620" Received: from linksys13920.jf.intel.com (HELO rpedgeco-DESK5.jf.intel.com) ([10.54.75.11]) by orsmga008.jf.intel.com with ESMTP; 03 Oct 2019 14:38:57 -0700 From: Rick Edgecombe To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-mm@kvack.org, luto@kernel.org, peterz@infradead.org, dave.hansen@intel.com, pbonzini@redhat.com, sean.j.christopherson@intel.com, keescook@chromium.org Cc: kristen@linux.intel.com, deneen.t.dock@intel.com, Rick Edgecombe , Yu Zhang Subject: [RFC PATCH 03/13] kvm: Add XO memslot type Date: Thu, 3 Oct 2019 14:23:50 -0700 Message-Id: <20191003212400.31130-4-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191003212400.31130-1-rick.p.edgecombe@intel.com> References: <20191003212400.31130-1-rick.p.edgecombe@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add XO memslot type to create execute-only guest physical memory based on the RO memslot. Like the RO memslot, disallow changing the memslot type to/from XO. In the EPT case ACC_USER_MASK represents the readable bit, so add the ability for set_spte() to unset this. This is based in part on a patch by Yu Zhang. Signed-off-by: Yu Zhang Signed-off-by: Rick Edgecombe --- arch/x86/kvm/mmu.c | 9 ++++++++- include/uapi/linux/kvm.h | 1 + tools/include/uapi/linux/kvm.h | 1 + virt/kvm/kvm_main.c | 15 ++++++++++++++- 4 files changed, 24 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index e44a8053af78..338cc64cc821 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2981,6 +2981,8 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, if (pte_access & ACC_USER_MASK) spte |= shadow_user_mask; + else + spte &= ~shadow_user_mask; if (level > PT_PAGE_TABLE_LEVEL) spte |= PT_PAGE_SIZE_MASK; @@ -3203,6 +3205,11 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, int write, int ret; gfn_t gfn = gpa >> PAGE_SHIFT; gfn_t base_gfn = gfn; + struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); + unsigned int pte_access = ACC_ALL; + + if (slot && slot->flags & KVM_MEM_EXECONLY) + pte_access = ACC_EXEC_MASK; if (!VALID_PAGE(vcpu->arch.mmu->root_hpa)) return RET_PF_RETRY; @@ -3222,7 +3229,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, int write, } } - ret = mmu_set_spte(vcpu, it.sptep, ACC_ALL, + ret = mmu_set_spte(vcpu, it.sptep, pte_access, write, level, base_gfn, pfn, prefault, map_writable); direct_pte_prefetch(vcpu, it.sptep); diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 5e3f12d5359e..ede487b7b216 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -109,6 +109,7 @@ struct kvm_userspace_memory_region { */ #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) #define KVM_MEM_READONLY (1UL << 1) +#define KVM_MEM_EXECONLY (1UL << 2) /* for KVM_IRQ_LINE */ struct kvm_irq_level { diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h index 5e3f12d5359e..ede487b7b216 100644 --- a/tools/include/uapi/linux/kvm.h +++ b/tools/include/uapi/linux/kvm.h @@ -109,6 +109,7 @@ struct kvm_userspace_memory_region { */ #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) #define KVM_MEM_READONLY (1UL << 1) +#define KVM_MEM_EXECONLY (1UL << 2) /* for KVM_IRQ_LINE */ struct kvm_irq_level { diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index c6a91b044d8d..65087c1d67be 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -865,6 +865,8 @@ static int check_memory_region_flags(const struct kvm_userspace_memory_region *m valid_flags |= KVM_MEM_READONLY; #endif + valid_flags |= KVM_MEM_EXECONLY; + if (mem->flags & ~valid_flags) return -EINVAL; @@ -969,9 +971,12 @@ int __kvm_set_memory_region(struct kvm *kvm, if (!old.npages) change = KVM_MR_CREATE; else { /* Modify an existing slot. */ + const __u8 changeable = KVM_MEM_READONLY + | KVM_MEM_EXECONLY; + if ((mem->userspace_addr != old.userspace_addr) || (npages != old.npages) || - ((new.flags ^ old.flags) & KVM_MEM_READONLY)) + ((new.flags ^ old.flags) & changeable)) goto out; if (base_gfn != old.base_gfn) @@ -1356,6 +1361,11 @@ static bool memslot_is_readonly(struct kvm_memory_slot *slot) return slot->flags & KVM_MEM_READONLY; } +static bool memslot_is_execonly(struct kvm_memory_slot *slot) +{ + return slot->flags & KVM_MEM_EXECONLY; +} + static unsigned long __gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t gfn, gfn_t *nr_pages, bool write) { @@ -1365,6 +1375,9 @@ static unsigned long __gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t gfn, if (memslot_is_readonly(slot) && write) return KVM_HVA_ERR_RO_BAD; + if (memslot_is_execonly(slot) && write) + return KVM_HVA_ERR_RO_BAD; + if (nr_pages) *nr_pages = slot->npages - (gfn - slot->base_gfn); From patchwork Thu Oct 3 21:23:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 11173457 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5B20976 for ; Thu, 3 Oct 2019 21:40:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 438B7215EA for ; Thu, 3 Oct 2019 21:40:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387490AbfJCVi6 (ORCPT ); Thu, 3 Oct 2019 17:38:58 -0400 Received: from mga09.intel.com ([134.134.136.24]:52651 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730739AbfJCVi6 (ORCPT ); Thu, 3 Oct 2019 17:38:58 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Oct 2019 14:38:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.67,253,1566889200"; d="scan'208";a="186051623" Received: from linksys13920.jf.intel.com (HELO rpedgeco-DESK5.jf.intel.com) ([10.54.75.11]) by orsmga008.jf.intel.com with ESMTP; 03 Oct 2019 14:38:57 -0700 From: Rick Edgecombe To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-mm@kvack.org, luto@kernel.org, peterz@infradead.org, dave.hansen@intel.com, pbonzini@redhat.com, sean.j.christopherson@intel.com, keescook@chromium.org Cc: kristen@linux.intel.com, deneen.t.dock@intel.com, Rick Edgecombe Subject: [RFC PATCH 04/13] kvm, vmx: Add support for gva exit qualification Date: Thu, 3 Oct 2019 14:23:51 -0700 Message-Id: <20191003212400.31130-5-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191003212400.31130-1-rick.p.edgecombe@intel.com> References: <20191003212400.31130-1-rick.p.edgecombe@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org VMX supports providing the guest virtual address that caused and EPT violation. Add support for this so it can be used by the KVM XO feature. Signed-off-by: Rick Edgecombe --- arch/x86/include/asm/kvm_host.h | 4 ++++ arch/x86/include/asm/vmx.h | 1 + arch/x86/kvm/vmx/vmx.c | 5 +++++ arch/x86/kvm/x86.c | 1 + 4 files changed, 11 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index bdc16b0aa7c6..b363a7fc47b0 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -781,6 +781,10 @@ struct kvm_vcpu_arch { bool gpa_available; gpa_t gpa_val; + /* GVA available */ + bool gva_available; + gva_t gva_val; + /* be preempted when it's in kernel-mode(cpl=0) */ bool preempted_in_kernel; diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index a39136b0d509..67457f2d19e2 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -522,6 +522,7 @@ struct vmx_msr_entry { #define EPT_VIOLATION_READABLE_BIT 3 #define EPT_VIOLATION_WRITABLE_BIT 4 #define EPT_VIOLATION_EXECUTABLE_BIT 5 +#define EPT_VIOLATION_GVA_LINEAR_VALID 7 #define EPT_VIOLATION_GVA_TRANSLATED_BIT 8 #define EPT_VIOLATION_ACC_READ (1 << EPT_VIOLATION_ACC_READ_BIT) #define EPT_VIOLATION_ACC_WRITE (1 << EPT_VIOLATION_ACC_WRITE_BIT) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index c030c96fc81a..a30dbab8a2d4 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -5116,6 +5116,11 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) error_code |= (exit_qualification & 0x100) != 0 ? PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK; + if (exit_qualification | EPT_VIOLATION_GVA_LINEAR_VALID) { + vcpu->arch.gva_available = true; + vcpu->arch.gva_val = vmcs_readl(GUEST_LINEAR_ADDRESS); + } + vcpu->arch.exit_qualification = exit_qualification; return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0); } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 91602d310a3f..aa138d3a86c5 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8092,6 +8092,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) kvm_lapic_sync_from_vapic(vcpu); vcpu->arch.gpa_available = false; + vcpu->arch.gva_available = false; r = kvm_x86_ops->handle_exit(vcpu); return r; From patchwork Thu Oct 3 21:23:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 11173449 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E8AA815AB for ; Thu, 3 Oct 2019 21:39:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C6D762133F for ; Thu, 3 Oct 2019 21:39:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387974AbfJCVi7 (ORCPT ); Thu, 3 Oct 2019 17:38:59 -0400 Received: from mga09.intel.com ([134.134.136.24]:52651 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387909AbfJCVi6 (ORCPT ); Thu, 3 Oct 2019 17:38:58 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Oct 2019 14:38:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.67,253,1566889200"; d="scan'208";a="186051627" Received: from linksys13920.jf.intel.com (HELO rpedgeco-DESK5.jf.intel.com) ([10.54.75.11]) by orsmga008.jf.intel.com with ESMTP; 03 Oct 2019 14:38:57 -0700 From: Rick Edgecombe To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-mm@kvack.org, luto@kernel.org, peterz@infradead.org, dave.hansen@intel.com, pbonzini@redhat.com, sean.j.christopherson@intel.com, keescook@chromium.org Cc: kristen@linux.intel.com, deneen.t.dock@intel.com, Rick Edgecombe Subject: [RFC PATCH 05/13] kvm: Add #PF injection for KVM XO Date: Thu, 3 Oct 2019 14:23:52 -0700 Message-Id: <20191003212400.31130-6-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191003212400.31130-1-rick.p.edgecombe@intel.com> References: <20191003212400.31130-1-rick.p.edgecombe@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org If there is a read or write violation on the gfn range of an XO memslot, then inject a page fault into the guest with the guest virtual address that faulted. This can be done directly if the hardware provides the gva access that caused the fault. Otherwise, the violating instruction needs to be emulated to figure it out. TODO: Currently ACC_USER_MASK is used to mean not-readable in the EPT case, but in the x86 page tables case it means the real user bit and so can't be overloaded to mean not readable. Probably a new dedicated ACC_ flag is needed for not readable to be used in XOM cases. Instead of changing that everywhere a conditional is added in paging_tmpl.h to check for the KVM XO bit. This should probably be made to work with the logic in permission_fault instead of having a special case. Signed-off-by: Rick Edgecombe --- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/mmu.c | 52 +++++++++++++++++++++++++++++++++ arch/x86/kvm/paging_tmpl.h | 29 ++++++++++++++---- arch/x86/kvm/x86.c | 5 +++- 4 files changed, 82 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index b363a7fc47b0..6d06c794d720 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -785,6 +785,8 @@ struct kvm_vcpu_arch { bool gva_available; gva_t gva_val; + bool xo_fault; + /* be preempted when it's in kernel-mode(cpl=0) */ bool preempted_in_kernel; diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 338cc64cc821..d5ba44066b62 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -45,6 +45,7 @@ #include #include #include +#include #include "trace.h" /* @@ -4130,6 +4131,34 @@ check_hugepage_cache_consistency(struct kvm_vcpu *vcpu, gfn_t gfn, int level) return kvm_mtrr_check_gfn_range_consistency(vcpu, gfn, page_num); } + +static int try_inject_exec_only_pf(struct kvm_vcpu *vcpu, u64 error_code) +{ + struct x86_exception fault; + int cpl = kvm_x86_ops->get_cpl(vcpu); + /* + * There is an assumption here that if there is an TDP violation for an + * XO memslot, then it must be a read or write fault. + */ + u16 fault_error_code = X86_PF_PROT | (cpl == 3 ? X86_PF_USER : 0); + + if (!vcpu->arch.gva_available) + return 0; + + if (error_code & PFERR_WRITE_MASK) + fault_error_code |= X86_PF_WRITE; + + fault.vector = PF_VECTOR; + fault.error_code_valid = true; + fault.error_code = fault_error_code; + fault.nested_page_fault = false; + fault.address = vcpu->arch.gva_val; + fault.async_page_fault = true; + kvm_inject_page_fault(vcpu, &fault); + + return 1; +} + static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, bool prefault) { @@ -4141,12 +4170,35 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, unsigned long mmu_seq; int write = error_code & PFERR_WRITE_MASK; bool map_writable; + struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)); if (page_fault_handle_page_track(vcpu, error_code, gfn)) return RET_PF_EMULATE; + /* + * Set xo_fault when the fault is a read or write fault on an xo memslot + * so that the emulator knows it needs to check page table permissions + * and will inject a fault. + */ + vcpu->arch.xo_fault = false; + if (slot && unlikely((slot->flags & KVM_MEM_EXECONLY) + && !(error_code & PFERR_FETCH_MASK))) + vcpu->arch.xo_fault = true; + + /* If memslot is xo, need to inject fault */ + if (unlikely(vcpu->arch.xo_fault)) { + /* + * If not enough information to inject the fault, + * emulate to figure it out and emulate the PF. + */ + if (!try_inject_exec_only_pf(vcpu, error_code)) + return RET_PF_EMULATE; + + return 1; + } + r = mmu_topup_memory_caches(vcpu); if (r) return r; diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 7d5cdb3af594..eae1871c5225 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -307,7 +307,9 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, gpa_t pte_gpa; bool have_ad; int offset; - u64 walk_nx_mask = 0; + u64 walk_mask = 0; + u64 walk_nr_mask = 0; + bool kvm_xo = guest_cpuid_has(vcpu, X86_FEATURE_KVM_XO); const int write_fault = access & PFERR_WRITE_MASK; const int user_fault = access & PFERR_USER_MASK; const int fetch_fault = access & PFERR_FETCH_MASK; @@ -322,7 +324,11 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, have_ad = PT_HAVE_ACCESSED_DIRTY(mmu); #if PTTYPE == 64 - walk_nx_mask = 1ULL << PT64_NX_SHIFT; + walk_mask = 1ULL << PT64_NX_SHIFT; + if (kvm_xo) { + walk_nr_mask = 1ULL << cpuid_maxphyaddr(vcpu); + walk_mask |= walk_nr_mask; + } if (walker->level == PT32E_ROOT_LEVEL) { pte = mmu->get_pdptr(vcpu, (addr >> 30) & 3); trace_kvm_mmu_paging_element(pte, walker->level); @@ -395,7 +401,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, * Inverting the NX it lets us AND it like other * permission bits. */ - pte_access = pt_access & (pte ^ walk_nx_mask); + pte_access = pt_access & (pte ^ walk_mask); if (unlikely(!FNAME(is_present_gpte)(pte))) goto error; @@ -412,12 +418,25 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, accessed_dirty = have_ad ? pte_access & PT_GUEST_ACCESSED_MASK : 0; /* Convert to ACC_*_MASK flags for struct guest_walker. */ - walker->pt_access = FNAME(gpte_access)(pt_access ^ walk_nx_mask); - walker->pte_access = FNAME(gpte_access)(pte_access ^ walk_nx_mask); + walker->pt_access = FNAME(gpte_access)(pt_access ^ walk_mask); + walker->pte_access = FNAME(gpte_access)(pte_access ^ walk_mask); + errcode = permission_fault(vcpu, mmu, walker->pte_access, pte_pkey, access); if (unlikely(errcode)) goto error; + /* + * KVM XO bit is not checked in permission_fault(), so check it here and + * inject appropriate fault. + */ + if (kvm_xo && !fetch_fault + && (walk_nr_mask & (pte_access ^ walk_nr_mask))) { + errcode = PFERR_PRESENT_MASK; + if (write_fault) + errcode |= PFERR_WRITE_MASK; + goto error; + } + gfn = gpte_to_gfn_lvl(pte, walker->level); gfn += (addr & PT_LVL_OFFSET_MASK(walker->level)) >> PAGE_SHIFT; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index aa138d3a86c5..2e321d788672 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5494,8 +5494,11 @@ static int emulator_read_write_onepage(unsigned long addr, void *val, * Note, this cannot be used on string operations since string * operation using rep will only have the initial GPA from the NPF * occurred. + * + * If the fault was an XO fault, we need to walk the page tables to + * determine the gva and emulate the PF. */ - if (vcpu->arch.gpa_available && + if (!vcpu->arch.xo_fault && vcpu->arch.gpa_available && emulator_can_use_gpa(ctxt) && (addr & ~PAGE_MASK) == (vcpu->arch.gpa_val & ~PAGE_MASK)) { gpa = vcpu->arch.gpa_val; From patchwork Thu Oct 3 21:23:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 11173413 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 804541902 for ; Thu, 3 Oct 2019 21:39:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6916421D71 for ; Thu, 3 Oct 2019 21:39:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388200AbfJCVjB (ORCPT ); Thu, 3 Oct 2019 17:39:01 -0400 Received: from mga09.intel.com ([134.134.136.24]:52651 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388044AbfJCVjA (ORCPT ); Thu, 3 Oct 2019 17:39:00 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Oct 2019 14:38:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.67,253,1566889200"; d="scan'208";a="186051630" Received: from linksys13920.jf.intel.com (HELO rpedgeco-DESK5.jf.intel.com) ([10.54.75.11]) by orsmga008.jf.intel.com with ESMTP; 03 Oct 2019 14:38:57 -0700 From: Rick Edgecombe To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-mm@kvack.org, luto@kernel.org, peterz@infradead.org, dave.hansen@intel.com, pbonzini@redhat.com, sean.j.christopherson@intel.com, keescook@chromium.org Cc: kristen@linux.intel.com, deneen.t.dock@intel.com, Rick Edgecombe Subject: [RFC PATCH 06/13] kvm: Add KVM_CAP_EXECONLY_MEM Date: Thu, 3 Oct 2019 14:23:53 -0700 Message-Id: <20191003212400.31130-7-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191003212400.31130-1-rick.p.edgecombe@intel.com> References: <20191003212400.31130-1-rick.p.edgecombe@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add a KVM capability for the KVM_MEM_EXECONLY memslot type. This memslot type is supported if the HW supports execute-only TDP. Signed-off-by: Rick Edgecombe --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/svm.c | 6 ++++++ arch/x86/kvm/vmx/vmx.c | 1 + arch/x86/kvm/x86.c | 3 +++ include/uapi/linux/kvm.h | 1 + 5 files changed, 12 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 6d06c794d720..be3ff71e6227 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1132,6 +1132,7 @@ struct kvm_x86_ops { bool (*xsaves_supported)(void); bool (*umip_emulated)(void); bool (*pt_supported)(void); + bool (*tdp_xo_supported)(void); int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr); void (*request_immediate_exit)(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index e0368076a1ef..f9f25f32e946 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -6005,6 +6005,11 @@ static bool svm_pt_supported(void) return false; } +static bool svm_xo_supported(void) +{ + return false; +} + static bool svm_has_wbinvd_exit(void) { return true; @@ -7293,6 +7298,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = { .xsaves_supported = svm_xsaves_supported, .umip_emulated = svm_umip_emulated, .pt_supported = svm_pt_supported, + .tdp_xo_supported = svm_xo_supported, .set_supported_cpuid = svm_set_supported_cpuid, diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index a30dbab8a2d4..7e7260c715f2 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7767,6 +7767,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = { .xsaves_supported = vmx_xsaves_supported, .umip_emulated = vmx_umip_emulated, .pt_supported = vmx_pt_supported, + .tdp_xo_supported = cpu_has_vmx_ept_execute_only, .request_immediate_exit = vmx_request_immediate_exit, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2e321d788672..810cfdb1a315 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3183,6 +3183,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = kvm_x86_ops->get_nested_state ? kvm_x86_ops->get_nested_state(NULL, NULL, 0) : 0; break; + case KVM_CAP_EXECONLY_MEM: + r = kvm_x86_ops->tdp_xo_supported(); + break; default: break; } diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index ede487b7b216..7778a1f03b78 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -997,6 +997,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171 #define KVM_CAP_ARM_PTRAUTH_GENERIC 172 #define KVM_CAP_PMU_EVENT_FILTER 173 +#define KVM_CAP_EXECONLY_MEM 174 #ifdef KVM_CAP_IRQ_ROUTING From patchwork Thu Oct 3 21:23:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 11173443 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 49BB576 for ; Thu, 3 Oct 2019 21:39:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 31D7E2133F for ; Thu, 3 Oct 2019 21:39:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388998AbfJCVjf (ORCPT ); Thu, 3 Oct 2019 17:39:35 -0400 Received: from mga09.intel.com ([134.134.136.24]:52651 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388076AbfJCVjA (ORCPT ); Thu, 3 Oct 2019 17:39:00 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Oct 2019 14:38:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.67,253,1566889200"; d="scan'208";a="186051632" Received: from linksys13920.jf.intel.com (HELO rpedgeco-DESK5.jf.intel.com) ([10.54.75.11]) by orsmga008.jf.intel.com with ESMTP; 03 Oct 2019 14:38:57 -0700 From: Rick Edgecombe To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-mm@kvack.org, luto@kernel.org, peterz@infradead.org, dave.hansen@intel.com, pbonzini@redhat.com, sean.j.christopherson@intel.com, keescook@chromium.org Cc: kristen@linux.intel.com, deneen.t.dock@intel.com, Rick Edgecombe Subject: [RFC PATCH 07/13] kvm: Add docs for KVM_CAP_EXECONLY_MEM Date: Thu, 3 Oct 2019 14:23:54 -0700 Message-Id: <20191003212400.31130-8-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191003212400.31130-1-rick.p.edgecombe@intel.com> References: <20191003212400.31130-1-rick.p.edgecombe@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add documentation for the KVM_CAP_EXECONLY_MEM capability and KVM_MEM_EXECONLY memslot. Signed-off-by: Rick Edgecombe --- Documentation/virt/kvm/api.txt | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt index 2d067767b617..a8001f996a8a 100644 --- a/Documentation/virt/kvm/api.txt +++ b/Documentation/virt/kvm/api.txt @@ -1096,6 +1096,7 @@ struct kvm_userspace_memory_region { /* for kvm_memory_region::flags */ #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) #define KVM_MEM_READONLY (1UL << 1) +#define KVM_MEM_EXECONLY (1UL << 2) This ioctl allows the user to create, modify or delete a guest physical memory slot. Bits 0-15 of "slot" specify the slot id and this value @@ -1123,12 +1124,15 @@ It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr be identical. This allows large pages in the guest to be backed by large pages in the host. -The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and -KVM_MEM_READONLY. The former can be set to instruct KVM to keep track of -writes to memory within the slot. See KVM_GET_DIRTY_LOG ioctl to know how to -use it. The latter can be set, if KVM_CAP_READONLY_MEM capability allows it, -to make a new slot read-only. In this case, writes to this memory will be -posted to userspace as KVM_EXIT_MMIO exits. +The flags field supports three flags: KVM_MEM_LOG_DIRTY_PAGES, KVM_MEM_READONLY +and KVM_MEM_EXECONLY. KVM_MEM_LOG_DIRTY_PAGES can be set to instruct KVM to +keep track of writes to memory within the slot. See KVM_GET_DIRTY_LOG ioctl to +know how to use it. KVM_MEM_READONLY can be set, if KVM_CAP_READONLY_MEM +capability allows it, to make a new slot read-only. In this case, writes to +this memory will be posted to userspace as KVM_EXIT_MMIO exits. KVM_MEM_EXECONLY +can be set, if KVM_CAP_EXECONLY_MEM capability allows it, to make a new slot +exec-only. Guest read accesses to KVM_CAP_EXECONLY_MEM will trigger an +appropriate fault injected into the guest, in support of X86_FEATURE_KVM_XO. When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of the memory region are automatically reflected into the guest. For example, an From patchwork Thu Oct 3 21:23:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 11173445 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4430976 for ; Thu, 3 Oct 2019 21:39:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2C97A2133F for ; Thu, 3 Oct 2019 21:39:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388956AbfJCVje (ORCPT ); Thu, 3 Oct 2019 17:39:34 -0400 Received: from mga09.intel.com ([134.134.136.24]:52653 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388080AbfJCVjA (ORCPT ); Thu, 3 Oct 2019 17:39:00 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Oct 2019 14:38:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.67,253,1566889200"; d="scan'208";a="186051635" Received: from linksys13920.jf.intel.com (HELO rpedgeco-DESK5.jf.intel.com) ([10.54.75.11]) by orsmga008.jf.intel.com with ESMTP; 03 Oct 2019 14:38:57 -0700 From: Rick Edgecombe To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-mm@kvack.org, luto@kernel.org, peterz@infradead.org, dave.hansen@intel.com, pbonzini@redhat.com, sean.j.christopherson@intel.com, keescook@chromium.org Cc: kristen@linux.intel.com, deneen.t.dock@intel.com, Rick Edgecombe Subject: [RFC PATCH 08/13] x86/boot: Rename USE_EARLY_PGTABLE_L5 Date: Thu, 3 Oct 2019 14:23:55 -0700 Message-Id: <20191003212400.31130-9-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191003212400.31130-1-rick.p.edgecombe@intel.com> References: <20191003212400.31130-1-rick.p.edgecombe@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Rename USE_EARLY_PGTABLE_L5 to USE_EARLY_PGTABLE so that it can be used by other early boot detectable page table features. Signed-off-by: Rick Edgecombe --- arch/x86/boot/compressed/misc.h | 2 +- arch/x86/include/asm/pgtable_64_types.h | 4 ++-- arch/x86/kernel/cpu/common.c | 2 +- arch/x86/kernel/head64.c | 2 +- arch/x86/mm/kasan_init_64.c | 2 +- 5 files changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h index c8181392f70d..45a23aa807bd 100644 --- a/arch/x86/boot/compressed/misc.h +++ b/arch/x86/boot/compressed/misc.h @@ -14,7 +14,7 @@ #undef CONFIG_KASAN /* cpu_feature_enabled() cannot be used this early */ -#define USE_EARLY_PGTABLE_L5 +#define USE_EARLY_PGTABLE #include #include diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h index 52e5f5f2240d..6b55b837ead4 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -23,7 +23,7 @@ typedef struct { pteval_t pte; } pte_t; #ifdef CONFIG_X86_5LEVEL extern unsigned int __pgtable_l5_enabled; -#ifdef USE_EARLY_PGTABLE_L5 +#ifdef USE_EARLY_PGTABLE /* * cpu_feature_enabled() is not available in early boot code. * Use variable instead. @@ -34,7 +34,7 @@ static inline bool pgtable_l5_enabled(void) } #else #define pgtable_l5_enabled() cpu_feature_enabled(X86_FEATURE_LA57) -#endif /* USE_EARLY_PGTABLE_L5 */ +#endif /* USE_EARLY_PGTABLE */ #else #define pgtable_l5_enabled() 0 diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index f125bf7ecb6f..4f08e164c0b1 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0-only /* cpu_feature_enabled() cannot be used this early */ -#define USE_EARLY_PGTABLE_L5 +#define USE_EARLY_PGTABLE #include #include diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index 29ffa495bd1c..55f5294c3cdf 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -8,7 +8,7 @@ #define DISABLE_BRANCH_PROFILING /* cpu_feature_enabled() cannot be used this early */ -#define USE_EARLY_PGTABLE_L5 +#define USE_EARLY_PGTABLE #include #include diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c index 296da58f3013..9466d7abae49 100644 --- a/arch/x86/mm/kasan_init_64.c +++ b/arch/x86/mm/kasan_init_64.c @@ -3,7 +3,7 @@ #define pr_fmt(fmt) "kasan: " fmt /* cpu_feature_enabled() cannot be used this early */ -#define USE_EARLY_PGTABLE_L5 +#define USE_EARLY_PGTABLE #include #include From patchwork Thu Oct 3 21:23:56 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 11173447 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1AA5F76 for ; Thu, 3 Oct 2019 21:39:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id ED5F7222C2 for ; Thu, 3 Oct 2019 21:39:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388840AbfJCVjc (ORCPT ); Thu, 3 Oct 2019 17:39:32 -0400 Received: from mga09.intel.com ([134.134.136.24]:52655 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388092AbfJCVjA (ORCPT ); Thu, 3 Oct 2019 17:39:00 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Oct 2019 14:38:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.67,253,1566889200"; d="scan'208";a="186051639" Received: from linksys13920.jf.intel.com (HELO rpedgeco-DESK5.jf.intel.com) ([10.54.75.11]) by orsmga008.jf.intel.com with ESMTP; 03 Oct 2019 14:38:57 -0700 From: Rick Edgecombe To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-mm@kvack.org, luto@kernel.org, peterz@infradead.org, dave.hansen@intel.com, pbonzini@redhat.com, sean.j.christopherson@intel.com, keescook@chromium.org Cc: kristen@linux.intel.com, deneen.t.dock@intel.com, Rick Edgecombe Subject: [RFC PATCH 09/13] x86/cpufeature: Add detection of KVM XO Date: Thu, 3 Oct 2019 14:23:56 -0700 Message-Id: <20191003212400.31130-10-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191003212400.31130-1-rick.p.edgecombe@intel.com> References: <20191003212400.31130-1-rick.p.edgecombe@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add a new CPUID leaf to hold the contents of CPUID 0x40000030 EAX to detect KVM defined generic VMM features. The leaf was proposed to allow KVM to communicate features that are defined by KVM, but available for any VMM to implement. Add cpu_feature_enabled() support for features in this leaf (KVM XO), and a pgtable_kvmxo_enabled() helper similar to pgtable_l5_enabled() so that pgtable_kvmxo_enabled() can be used in early code that includes arch/x86/include/asm/sparsemem.h. Lastly, in head64.c detect and this feature and perform necessary adjustments to physical_mask. Signed-off-by: Rick Edgecombe --- arch/x86/include/asm/cpufeature.h | 6 ++- arch/x86/include/asm/cpufeatures.h | 2 +- arch/x86/include/asm/disabled-features.h | 3 +- arch/x86/include/asm/pgtable_32_types.h | 1 + arch/x86/include/asm/pgtable_64_types.h | 26 ++++++++++++- arch/x86/include/asm/required-features.h | 3 +- arch/x86/include/asm/sparsemem.h | 4 +- arch/x86/kernel/cpu/common.c | 5 +++ arch/x86/kernel/head64.c | 38 ++++++++++++++++++- .../arch/x86/include/asm/disabled-features.h | 3 +- 10 files changed, 80 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index 17127ffbc2a2..7d04ea4f1623 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -82,8 +82,9 @@ extern const char * const x86_bug_flags[NBUGINTS*32]; CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 16, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 17, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 18, feature_bit) || \ + CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 19, feature_bit) || \ REQUIRED_MASK_CHECK || \ - BUILD_BUG_ON_ZERO(NCAPINTS != 19)) + BUILD_BUG_ON_ZERO(NCAPINTS != 20)) #define DISABLED_MASK_BIT_SET(feature_bit) \ ( CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 0, feature_bit) || \ @@ -105,8 +106,9 @@ extern const char * const x86_bug_flags[NBUGINTS*32]; CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 16, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 17, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 18, feature_bit) || \ + CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 19, feature_bit) || \ DISABLED_MASK_CHECK || \ - BUILD_BUG_ON_ZERO(NCAPINTS != 19)) + BUILD_BUG_ON_ZERO(NCAPINTS != 20)) #define cpu_has(c, bit) \ (__builtin_constant_p(bit) && REQUIRED_MASK_BIT_SET(bit) ? 1 : \ diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 7ba217e894ea..9c1b07674401 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -13,7 +13,7 @@ /* * Defines x86 CPU feature bits */ -#define NCAPINTS 19 /* N 32-bit words worth of info */ +#define NCAPINTS 20 /* N 32-bit words worth of info */ #define NBUGINTS 1 /* N 32-bit bug flags */ /* diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index a5ea841cc6d2..f0f935f8d917 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -84,6 +84,7 @@ #define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP) #define DISABLED_MASK17 0 #define DISABLED_MASK18 0 -#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19) +#define DISABLED_MASK19 0 +#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 20) #endif /* _ASM_X86_DISABLED_FEATURES_H */ diff --git a/arch/x86/include/asm/pgtable_32_types.h b/arch/x86/include/asm/pgtable_32_types.h index b0bc0fff5f1f..57a11692715e 100644 --- a/arch/x86/include/asm/pgtable_32_types.h +++ b/arch/x86/include/asm/pgtable_32_types.h @@ -16,6 +16,7 @@ #endif #define pgtable_l5_enabled() 0 +#define pgtable_kvmxo_enabled() 0 #define PGDIR_SIZE (1UL << PGDIR_SHIFT) #define PGDIR_MASK (~(PGDIR_SIZE - 1)) diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h index 6b55b837ead4..7c7c9d1a199a 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -43,10 +43,34 @@ static inline bool pgtable_l5_enabled(void) extern unsigned int pgdir_shift; extern unsigned int ptrs_per_p4d; +#ifdef CONFIG_KVM_XO +extern unsigned int __pgtable_kvmxo_enabled; + +#ifdef USE_EARLY_PGTABLE +/* + * cpu_feature_enabled() is not available in early boot code. + * Use variable instead. + */ +static inline bool pgtable_kvmxo_enabled(void) +{ + return __pgtable_kvmxo_enabled; +} +#else +#define pgtable_kvmxo_enabled() cpu_feature_enabled(X86_FEATURE_KVM_XO) +#endif /* USE_EARLY_PGTABLE */ + +#else +#define pgtable_kvmxo_enabled() 0 +#endif /* CONFIG_KVM_XO */ + #endif /* !__ASSEMBLY__ */ #define SHARED_KERNEL_PMD 0 +#if defined(CONFIG_X86_5LEVEL) || defined(CONFIG_KVM_XO) +#define MAX_POSSIBLE_PHYSMEM_BITS 52 +#endif + #ifdef CONFIG_X86_5LEVEL /* @@ -64,8 +88,6 @@ extern unsigned int ptrs_per_p4d; #define P4D_SIZE (_AC(1, UL) << P4D_SHIFT) #define P4D_MASK (~(P4D_SIZE - 1)) -#define MAX_POSSIBLE_PHYSMEM_BITS 52 - #else /* CONFIG_X86_5LEVEL */ /* diff --git a/arch/x86/include/asm/required-features.h b/arch/x86/include/asm/required-features.h index 6847d85400a8..fa5700097f64 100644 --- a/arch/x86/include/asm/required-features.h +++ b/arch/x86/include/asm/required-features.h @@ -101,6 +101,7 @@ #define REQUIRED_MASK16 0 #define REQUIRED_MASK17 0 #define REQUIRED_MASK18 0 -#define REQUIRED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19) +#define REQUIRED_MASK19 0 +#define REQUIRED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 20) #endif /* _ASM_X86_REQUIRED_FEATURES_H */ diff --git a/arch/x86/include/asm/sparsemem.h b/arch/x86/include/asm/sparsemem.h index 199218719a86..24b305195369 100644 --- a/arch/x86/include/asm/sparsemem.h +++ b/arch/x86/include/asm/sparsemem.h @@ -27,8 +27,8 @@ # endif #else /* CONFIG_X86_32 */ # define SECTION_SIZE_BITS 27 /* matt - 128 is convenient right now */ -# define MAX_PHYSADDR_BITS (pgtable_l5_enabled() ? 52 : 44) -# define MAX_PHYSMEM_BITS (pgtable_l5_enabled() ? 52 : 46) +# define MAX_PHYSADDR_BITS ((pgtable_l5_enabled() ? 52 : 44) - !!pgtable_kvmxo_enabled()) +# define MAX_PHYSMEM_BITS ((pgtable_l5_enabled() ? 52 : 46) - !!pgtable_kvmxo_enabled()) #endif #endif /* CONFIG_SPARSEMEM */ diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 4f08e164c0b1..ee204aefbcfd 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -933,6 +933,11 @@ void get_cpu_cap(struct cpuinfo_x86 *c) c->x86_capability[CPUID_D_1_EAX] = eax; } + eax = cpuid_eax(0x40000000); + c->extended_cpuid_level = eax; + if (c->extended_cpuid_level >= 0x40000030) + c->x86_capability[CPUID_4000_0030_EAX] = cpuid_eax(0x40000030); + /* AMD-defined flags: level 0x80000001 */ eax = cpuid_eax(0x80000000); c->extended_cpuid_level = eax; diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index 55f5294c3cdf..7091702a7bec 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -52,6 +52,11 @@ unsigned int ptrs_per_p4d __ro_after_init = 1; EXPORT_SYMBOL(ptrs_per_p4d); #endif +#ifdef CONFIG_KVM_XO +unsigned int __pgtable_kvmxo_enabled __ro_after_init; +unsigned int __pgtable_kvmxo_bit __ro_after_init; +#endif /* CONFIG_KVM_XO */ + #ifdef CONFIG_DYNAMIC_MEMORY_LAYOUT unsigned long page_offset_base __ro_after_init = __PAGE_OFFSET_BASE_L4; EXPORT_SYMBOL(page_offset_base); @@ -73,12 +78,14 @@ static unsigned long __head *fixup_long(void *ptr, unsigned long physaddr) return fixup_pointer(ptr, physaddr); } -#ifdef CONFIG_X86_5LEVEL +#if defined(CONFIG_X86_5LEVEL) || defined(CONFIG_KVM_XO) static unsigned int __head *fixup_int(void *ptr, unsigned long physaddr) { return fixup_pointer(ptr, physaddr); } +#endif +#ifdef CONFIG_X86_5LEVEL static bool __head check_la57_support(unsigned long physaddr) { /* @@ -104,6 +111,33 @@ static bool __head check_la57_support(unsigned long physaddr) } #endif +#ifdef CONFIG_KVM_XO +static void __head check_kvmxo_support(unsigned long physaddr) +{ + unsigned long physbits; + + if ((native_cpuid_eax(0x40000000) < 0x40000030) || + !(native_cpuid_eax(0x40000030) & (1 << (X86_FEATURE_KVM_XO & 31)))) + return; + + if (native_cpuid_eax(0x80000000) < 0x80000008) + return; + + physbits = native_cpuid_eax(0x80000008) & 0xff; + + /* + * If KVM XO is active, the top physical address bit is the permisison + * bit, so zero it in the mask. + */ + physical_mask &= ~(1UL << physbits); + + *fixup_int(&__pgtable_kvmxo_enabled, physaddr) = 1; + *fixup_int(&__pgtable_kvmxo_bit, physaddr) = physbits; +} +#else /* CONFIG_KVM_XO */ +static void __head check_kvmxo_support(unsigned long physaddr) { } +#endif /* CONFIG_KVM_XO */ + /* Code in __startup_64() can be relocated during execution, but the compiler * doesn't have to generate PC-relative relocations when accessing globals from * that function. Clang actually does not generate them, which leads to @@ -127,6 +161,8 @@ unsigned long __head __startup_64(unsigned long physaddr, la57 = check_la57_support(physaddr); + check_kvmxo_support(physaddr); + /* Is the address too large? */ if (physaddr >> MAX_PHYSMEM_BITS) for (;;); diff --git a/tools/arch/x86/include/asm/disabled-features.h b/tools/arch/x86/include/asm/disabled-features.h index a5ea841cc6d2..f0f935f8d917 100644 --- a/tools/arch/x86/include/asm/disabled-features.h +++ b/tools/arch/x86/include/asm/disabled-features.h @@ -84,6 +84,7 @@ #define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP) #define DISABLED_MASK17 0 #define DISABLED_MASK18 0 -#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19) +#define DISABLED_MASK19 0 +#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 20) #endif /* _ASM_X86_DISABLED_FEATURES_H */ From patchwork Thu Oct 3 21:23:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 11173439 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 12F3415AB for ; Thu, 3 Oct 2019 21:39:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id EFF39222C2 for ; Thu, 3 Oct 2019 21:39:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388548AbfJCVjP (ORCPT ); Thu, 3 Oct 2019 17:39:15 -0400 Received: from mga09.intel.com ([134.134.136.24]:52653 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388127AbfJCVjB (ORCPT ); Thu, 3 Oct 2019 17:39:01 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Oct 2019 14:38:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.67,253,1566889200"; d="scan'208";a="186051641" Received: from linksys13920.jf.intel.com (HELO rpedgeco-DESK5.jf.intel.com) ([10.54.75.11]) by orsmga008.jf.intel.com with ESMTP; 03 Oct 2019 14:38:57 -0700 From: Rick Edgecombe To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-mm@kvack.org, luto@kernel.org, peterz@infradead.org, dave.hansen@intel.com, pbonzini@redhat.com, sean.j.christopherson@intel.com, keescook@chromium.org Cc: kristen@linux.intel.com, deneen.t.dock@intel.com, Rick Edgecombe Subject: [RFC PATCH 10/13] x86/mm: Add NR page bit for KVM XO Date: Thu, 3 Oct 2019 14:23:57 -0700 Message-Id: <20191003212400.31130-11-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191003212400.31130-1-rick.p.edgecombe@intel.com> References: <20191003212400.31130-1-rick.p.edgecombe@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add _PAGE_BIT_NR and _PAGE_NR, the values of which are determined dynamically at boot. This page type is only valid after checking for for the KVM XO CPUID bit. Signed-off-by: Rick Edgecombe --- arch/x86/include/asm/pgtable_types.h | 11 +++++++++++ arch/x86/mm/init.c | 3 +++ 2 files changed, 14 insertions(+) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index b5e49e6bac63..d3c92c992089 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -30,6 +30,14 @@ #define _PAGE_BIT_PKEY_BIT3 62 /* Protection Keys, bit 4/4 */ #define _PAGE_BIT_NX 63 /* No execute: only valid after cpuid check */ +#if defined(CONFIG_KVM_XO) && !defined(__ASSEMBLY__) +extern unsigned int __pgtable_kvmxo_bit; +/* KVM based not-readable: only valid after cpuid check */ +#define _PAGE_BIT_NR (__pgtable_kvmxo_bit) +#else /* defined(CONFIG_KVM_XO) && !defined(__ASSEMBLY__) */ +#define _PAGE_BIT_NR 0 +#endif /* defined(CONFIG_KVM_XO) && !defined(__ASSEMBLY__) */ + #define _PAGE_BIT_SPECIAL _PAGE_BIT_SOFTW1 #define _PAGE_BIT_CPA_TEST _PAGE_BIT_SOFTW1 #define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_SOFTW3 /* software dirty tracking */ @@ -39,6 +47,9 @@ /* - if the user mapped it with PROT_NONE; pte_present gives true */ #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL +#define _PAGE_NR (pgtable_kvmxo_enabled() ? \ + (_AT(pteval_t, 1) << _PAGE_BIT_NR) : 0) + #define _PAGE_PRESENT (_AT(pteval_t, 1) << _PAGE_BIT_PRESENT) #define _PAGE_RW (_AT(pteval_t, 1) << _PAGE_BIT_RW) #define _PAGE_USER (_AT(pteval_t, 1) << _PAGE_BIT_USER) diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index fd10d91a6115..7298156a76d5 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -195,6 +195,9 @@ static void __init probe_page_size_mask(void) __supported_pte_mask |= _PAGE_GLOBAL; } + if (pgtable_kvmxo_enabled()) + __supported_pte_mask |= _PAGE_NR; + /* By the default is everything supported: */ __default_kernel_pte_mask = __supported_pte_mask; /* Except when with PTI where the kernel is mostly non-Global: */ From patchwork Thu Oct 3 21:23:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 11173435 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E420976 for ; Thu, 3 Oct 2019 21:39:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CC1A2215EA for ; Thu, 3 Oct 2019 21:39:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388622AbfJCVjP (ORCPT ); Thu, 3 Oct 2019 17:39:15 -0400 Received: from mga09.intel.com ([134.134.136.24]:52655 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388128AbfJCVjB (ORCPT ); Thu, 3 Oct 2019 17:39:01 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Oct 2019 14:38:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.67,253,1566889200"; d="scan'208";a="186051644" Received: from linksys13920.jf.intel.com (HELO rpedgeco-DESK5.jf.intel.com) ([10.54.75.11]) by orsmga008.jf.intel.com with ESMTP; 03 Oct 2019 14:38:58 -0700 From: Rick Edgecombe To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-mm@kvack.org, luto@kernel.org, peterz@infradead.org, dave.hansen@intel.com, pbonzini@redhat.com, sean.j.christopherson@intel.com, keescook@chromium.org Cc: kristen@linux.intel.com, deneen.t.dock@intel.com, Rick Edgecombe Subject: [RFC PATCH 11/13] x86, ptdump: Add NR bit to page table dump Date: Thu, 3 Oct 2019 14:23:58 -0700 Message-Id: <20191003212400.31130-12-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191003212400.31130-1-rick.p.edgecombe@intel.com> References: <20191003212400.31130-1-rick.p.edgecombe@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add printing of the NR permission to the page table dump code. Signed-off-by: Rick Edgecombe --- arch/x86/mm/dump_pagetables.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index ab67822fd2f4..8932aa9e3a9e 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -182,7 +182,7 @@ static void printk_prot(struct seq_file *m, pgprot_t prot, int level, bool dmsg) if (!(pr & _PAGE_PRESENT)) { /* Not present */ - pt_dump_cont_printf(m, dmsg, " "); + pt_dump_cont_printf(m, dmsg, " "); } else { if (pr & _PAGE_USER) pt_dump_cont_printf(m, dmsg, "USR "); @@ -219,6 +219,10 @@ static void printk_prot(struct seq_file *m, pgprot_t prot, int level, bool dmsg) pt_dump_cont_printf(m, dmsg, "NX "); else pt_dump_cont_printf(m, dmsg, "x "); + if (pr & _PAGE_NR) + pt_dump_cont_printf(m, dmsg, "NR "); + else + pt_dump_cont_printf(m, dmsg, "r "); } pt_dump_cont_printf(m, dmsg, "%s\n", level_name[level]); } From patchwork Thu Oct 3 21:23:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 11173441 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5FB7876 for ; Thu, 3 Oct 2019 21:39:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 49F01222C2 for ; Thu, 3 Oct 2019 21:39:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388504AbfJCVjO (ORCPT ); Thu, 3 Oct 2019 17:39:14 -0400 Received: from mga09.intel.com ([134.134.136.24]:52651 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387909AbfJCVjB (ORCPT ); Thu, 3 Oct 2019 17:39:01 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Oct 2019 14:38:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.67,253,1566889200"; d="scan'208";a="186051647" Received: from linksys13920.jf.intel.com (HELO rpedgeco-DESK5.jf.intel.com) ([10.54.75.11]) by orsmga008.jf.intel.com with ESMTP; 03 Oct 2019 14:38:58 -0700 From: Rick Edgecombe To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-mm@kvack.org, luto@kernel.org, peterz@infradead.org, dave.hansen@intel.com, pbonzini@redhat.com, sean.j.christopherson@intel.com, keescook@chromium.org Cc: kristen@linux.intel.com, deneen.t.dock@intel.com, Rick Edgecombe Subject: [RFC PATCH 12/13] mmap: Add XO support for KVM XO Date: Thu, 3 Oct 2019 14:23:59 -0700 Message-Id: <20191003212400.31130-13-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191003212400.31130-1-rick.p.edgecombe@intel.com> References: <20191003212400.31130-1-rick.p.edgecombe@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The KVM XO feature enables the ability to create execute-only virtual memory. Use this feature to create XO memory when PROT_EXEC and not PROT_READ, as the behavior in the case of protection keys for userspace and some arm64 platforms. In the case of the ability to create execute only memory with protection keys AND the ability to create native execute only memory, use the KVM XO method of creating execute only memory to save a protection key. Set the values of the __P100 and __S100 in protection_map during boot instead of statically because the actual KVM XO bit in the PTE is determinted at boot time and so can't be known at compile time. Signed-off-by: Rick Edgecombe --- arch/x86/include/asm/pgtable_types.h | 2 ++ arch/x86/kernel/head64.c | 3 +++ mm/mmap.c | 30 +++++++++++++++++++++++----- 3 files changed, 30 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index d3c92c992089..fe976b4f0132 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -176,6 +176,8 @@ enum page_cache_mode { _PAGE_ACCESSED | _PAGE_NX) #define PAGE_READONLY_EXEC __pgprot(_PAGE_PRESENT | _PAGE_USER | \ _PAGE_ACCESSED) +#define PAGE_EXECONLY __pgprot(_PAGE_PRESENT | _PAGE_USER | \ + _PAGE_ACCESSED | _PAGE_NR) #define __PAGE_KERNEL_EXEC \ (_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_GLOBAL) diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index 7091702a7bec..69772b6e1810 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -133,6 +133,9 @@ static void __head check_kvmxo_support(unsigned long physaddr) *fixup_int(&__pgtable_kvmxo_enabled, physaddr) = 1; *fixup_int(&__pgtable_kvmxo_bit, physaddr) = physbits; + + protection_map[4] = PAGE_EXECONLY; + protection_map[12] = PAGE_EXECONLY; } #else /* CONFIG_KVM_XO */ static void __head check_kvmxo_support(unsigned long physaddr) { } diff --git a/mm/mmap.c b/mm/mmap.c index 7e8c3e8ae75f..034ffa0255b2 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1379,6 +1379,29 @@ static inline bool file_mmap_ok(struct file *file, struct inode *inode, return true; } +static inline int get_pkey(unsigned long flags) +{ + const unsigned long p_xo = pgprot_val(protection_map[4]); + const unsigned long p_xr = pgprot_val(protection_map[5]); + const unsigned long s_xo = pgprot_val(protection_map[12]); + const unsigned long s_xr = pgprot_val(protection_map[13]); + int pkey; + + /* Prefer non-pkey XO capability if available, to save a pkey */ + + if (flags & MAP_PRIVATE && (p_xo != p_xr)) + return 0; + + if (flags & MAP_SHARED && (s_xo != s_xr)) + return 0; + + pkey = execute_only_pkey(current->mm); + if (pkey < 0) + pkey = 0; + + return pkey; +} + /* * The caller must hold down_write(¤t->mm->mmap_sem). */ @@ -1440,11 +1463,8 @@ unsigned long do_mmap(struct file *file, unsigned long addr, return -EEXIST; } - if (prot == PROT_EXEC) { - pkey = execute_only_pkey(mm); - if (pkey < 0) - pkey = 0; - } + if (prot == PROT_EXEC) + pkey = get_pkey(flags); /* Do simple checking here so the lower-level routines won't have * to. we assume access permissions have been handled by the open From patchwork Thu Oct 3 21:24:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 11173419 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0C85D15AB for ; Thu, 3 Oct 2019 21:39:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E988721783 for ; Thu, 3 Oct 2019 21:39:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388405AbfJCVjI (ORCPT ); Thu, 3 Oct 2019 17:39:08 -0400 Received: from mga09.intel.com ([134.134.136.24]:52653 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388163AbfJCVjB (ORCPT ); Thu, 3 Oct 2019 17:39:01 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Oct 2019 14:38:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.67,253,1566889200"; d="scan'208";a="186051650" Received: from linksys13920.jf.intel.com (HELO rpedgeco-DESK5.jf.intel.com) ([10.54.75.11]) by orsmga008.jf.intel.com with ESMTP; 03 Oct 2019 14:38:58 -0700 From: Rick Edgecombe To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-mm@kvack.org, luto@kernel.org, peterz@infradead.org, dave.hansen@intel.com, pbonzini@redhat.com, sean.j.christopherson@intel.com, keescook@chromium.org Cc: kristen@linux.intel.com, deneen.t.dock@intel.com, Rick Edgecombe Subject: [RFC PATCH 13/13] x86/Kconfig: Add Kconfig for KVM based XO Date: Thu, 3 Oct 2019 14:24:00 -0700 Message-Id: <20191003212400.31130-14-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191003212400.31130-1-rick.p.edgecombe@intel.com> References: <20191003212400.31130-1-rick.p.edgecombe@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add CONFIG_KVM_XO for supporting KVM based execute only memory. Signed-off-by: Rick Edgecombe --- arch/x86/Kconfig | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 222855cc0158..3a3af2a456e8 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -802,6 +802,19 @@ config KVM_GUEST underlying device model, the host provides the guest with timing infrastructure such as time of day, and system time +config KVM_XO + bool "Support for KVM based execute only virtual memory permissions" + select DYNAMIC_PHYSICAL_MASK + select SPARSEMEM_VMEMMAP + depends on KVM_GUEST && X86_64 + default y + help + This option enables support for execute only memory for KVM guests. If + support from the underlying VMM is not detected at boot, this + capability will automatically disable. + + If you are unsure how to answer this question, answer Y. + config PVH bool "Support for running PVH guests" ---help---