From patchwork Wed Jun 17 19:07:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Andersen, John" X-Patchwork-Id: 11610451 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 740DC13A0 for ; Wed, 17 Jun 2020 19:05:28 +0000 (UTC) Received: from mother.openwall.net (mother.openwall.net [195.42.179.200]) by mail.kernel.org (Postfix) with SMTP id DA35D2088E for ; Wed, 17 Jun 2020 19:05:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DA35D2088E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kernel-hardening-return-19000-patchwork-kernel-hardening=patchwork.kernel.org@lists.openwall.com Received: (qmail 7808 invoked by uid 550); 17 Jun 2020 19:05:20 -0000 Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Delivered-To: mailing list kernel-hardening@lists.openwall.com Received: (qmail 7649 invoked from network); 17 Jun 2020 19:05:19 -0000 IronPort-SDR: yHzrYgMcGxhOd5qL1C24vv/sAahVp+h06nPVi1T1umy+DLnqszoDlpSwJcwCIF455APPDVGLrD EI2tIeiA4pqA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False IronPort-SDR: XaTM/0pxYcUKBtQfhnD+idd+M4VFhRN/WvrCorf4TvEgLkY2ri0Xu/TSQT3k5nLvh2ixTBEmy0 BGSU6Y+wGWJA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,523,1583222400"; d="scan'208";a="273609601" From: John Andersen To: corbet@lwn.net, pbonzini@redhat.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, shuah@kernel.org, sean.j.christopherson@intel.com, liran.alon@oracle.com, drjones@redhat.com, rick.p.edgecombe@intel.com, kristen@linux.intel.com Cc: vkuznets@redhat.com, wanpengli@tencent.com, jmattson@google.com, joro@8bytes.org, mchehab+huawei@kernel.org, gregkh@linuxfoundation.org, paulmck@kernel.org, pawan.kumar.gupta@linux.intel.com, jgross@suse.com, mike.kravetz@oracle.com, oneukum@suse.com, luto@kernel.org, peterz@infradead.org, fenghua.yu@intel.com, reinette.chatre@intel.com, vineela.tummalapalli@intel.com, dave.hansen@linux.intel.com, john.s.andersen@intel.com, arjan@linux.intel.com, caoj.fnst@cn.fujitsu.com, bhe@redhat.com, nivedita@alum.mit.edu, keescook@chromium.org, dan.j.williams@intel.com, eric.auger@redhat.com, aaronlewis@google.com, peterx@redhat.com, makarandsonare@google.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, kernel-hardening@lists.openwall.com Subject: [PATCH 1/4] X86: Update mmu_cr4_features during feature identification Date: Wed, 17 Jun 2020 12:07:54 -0700 Message-Id: <20200617190757.27081-2-john.s.andersen@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200617190757.27081-1-john.s.andersen@intel.com> References: <20200617190757.27081-1-john.s.andersen@intel.com> MIME-Version: 1.0 In identify_cpu when setting up SMEP/SMAP/UMIP call cr4_set_bits_and_update_boot instead of cr4_set_bits. This ensures that mmu_cr4_features contains those bits, and does not disable those protections when in hibernation asm. setup_arch updates mmu_cr4_features to save what identified features are supported for later use in hibernation asm when cr4 needs to be modified to toggle PGE. cr4 writes happen in restore_image and restore_registers. setup_arch occurs before identify_cpu, this leads to mmu_cr4_features not containing some of the cr4 features which were enabled via identify_cpu when hibernation asm is executed. On CPU bringup when cr4_set_bits_and_update_boot is called mmu_cr4_features will now be written to. For the boot CPU, the __ro_after_init on mmu_cr4_features does not cause a fault. However, __ro_after_init was removed due to it triggering faults on non-boot CPUs. Signed-off-by: John Andersen --- arch/x86/kernel/cpu/common.c | 6 +++--- arch/x86/kernel/setup.c | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index d07809286b95..921e67086a00 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -297,7 +297,7 @@ __setup("nosmep", setup_disable_smep); static __always_inline void setup_smep(struct cpuinfo_x86 *c) { if (cpu_has(c, X86_FEATURE_SMEP)) - cr4_set_bits(X86_CR4_SMEP); + cr4_set_bits_and_update_boot(X86_CR4_SMEP); } static __init int setup_disable_smap(char *arg) @@ -316,7 +316,7 @@ static __always_inline void setup_smap(struct cpuinfo_x86 *c) if (cpu_has(c, X86_FEATURE_SMAP)) { #ifdef CONFIG_X86_SMAP - cr4_set_bits(X86_CR4_SMAP); + cr4_set_bits_and_update_boot(X86_CR4_SMAP); #else cr4_clear_bits(X86_CR4_SMAP); #endif @@ -333,7 +333,7 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c) if (!cpu_has(c, X86_FEATURE_UMIP)) goto out; - cr4_set_bits(X86_CR4_UMIP); + cr4_set_bits_and_update_boot(X86_CR4_UMIP); pr_info_once("x86/cpu: User Mode Instruction Prevention (UMIP) activated\n"); diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index a3767e74c758..d9c678b37a9b 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -138,9 +138,9 @@ EXPORT_SYMBOL(boot_cpu_data); #if !defined(CONFIG_X86_PAE) || defined(CONFIG_X86_64) -__visible unsigned long mmu_cr4_features __ro_after_init; +__visible unsigned long mmu_cr4_features; #else -__visible unsigned long mmu_cr4_features __ro_after_init = X86_CR4_PAE; +__visible unsigned long mmu_cr4_features = X86_CR4_PAE; #endif /* Boot loader ID and version as integers, for the benefit of proc_dointvec */ From patchwork Wed Jun 17 19:07:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Andersen, John" X-Patchwork-Id: 11610455 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 43E941731 for ; Wed, 17 Jun 2020 19:05:36 +0000 (UTC) Received: from mother.openwall.net (mother.openwall.net [195.42.179.200]) by mail.kernel.org (Postfix) with SMTP id 53D0B2088E for ; Wed, 17 Jun 2020 19:05:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 53D0B2088E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kernel-hardening-return-19001-patchwork-kernel-hardening=patchwork.kernel.org@lists.openwall.com Received: (qmail 7993 invoked by uid 550); 17 Jun 2020 19:05:22 -0000 Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Delivered-To: mailing list kernel-hardening@lists.openwall.com Received: (qmail 7840 invoked from network); 17 Jun 2020 19:05:20 -0000 IronPort-SDR: XxU3uERcI0+Q2QFWxvpRzepLjn/26N+I6JisR5+w/lGqRBcaAAp1I10Auc9/bn6BiNxRunV/pM SX4ranUBy08g== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False IronPort-SDR: jsScCobBAxNj1+5VqD2s0okJHLATdhpgHIK+77640LR3Z49+tPFiJIh8n9cZE5wcdhDqg4mYTy E/N03xO2s5tQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,523,1583222400"; d="scan'208";a="273609635" From: John Andersen To: corbet@lwn.net, pbonzini@redhat.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, shuah@kernel.org, sean.j.christopherson@intel.com, liran.alon@oracle.com, drjones@redhat.com, rick.p.edgecombe@intel.com, kristen@linux.intel.com Cc: vkuznets@redhat.com, wanpengli@tencent.com, jmattson@google.com, joro@8bytes.org, mchehab+huawei@kernel.org, gregkh@linuxfoundation.org, paulmck@kernel.org, pawan.kumar.gupta@linux.intel.com, jgross@suse.com, mike.kravetz@oracle.com, oneukum@suse.com, luto@kernel.org, peterz@infradead.org, fenghua.yu@intel.com, reinette.chatre@intel.com, vineela.tummalapalli@intel.com, dave.hansen@linux.intel.com, john.s.andersen@intel.com, arjan@linux.intel.com, caoj.fnst@cn.fujitsu.com, bhe@redhat.com, nivedita@alum.mit.edu, keescook@chromium.org, dan.j.williams@intel.com, eric.auger@redhat.com, aaronlewis@google.com, peterx@redhat.com, makarandsonare@google.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, kernel-hardening@lists.openwall.com Subject: [PATCH 2/4] KVM: x86: Introduce paravirt feature CR0/CR4 pinning Date: Wed, 17 Jun 2020 12:07:55 -0700 Message-Id: <20200617190757.27081-3-john.s.andersen@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200617190757.27081-1-john.s.andersen@intel.com> References: <20200617190757.27081-1-john.s.andersen@intel.com> MIME-Version: 1.0 Add a CR pin feature bit to the KVM cpuid. Add read only MSRs to KVM which guests use to identify which bits they may request be pinned. Add CR pinned MSRs to KVM. Allow guests to request that KVM pin certain bits within control register 0 or 4 via the CR pinned MSRs. Writes to the MSRs fail if they include bits which aren't allowed. Host userspace may clear or modify pinned bits at any time. Once pinned bits are set, the guest may pin additional allowed bits, but not clear any. Clear pinning on vCPU reset. In the event that the guest vCPU attempts to disable any of the pinned bits, send that vCPU a general protection fault, and leave the register unchanged. Clear pinning on vCPU reset to avoid faulting non-boot CPUs when they are disabled and then re-enabled, which is done when hibernating. Pinning is not active when running in SMM. Entering SMM disables pinned bits. Writes to control registers within SMM would therefore trigger general protection faults if pinning was enforced. Upon exit from SMM, SMRAM is modified to ensure the values of CR0/4 that will be restored contain the correct values for pinned bits. CR0/4 values are then restored from SMRAM as usual. When running with nested virtualization, should pinned bits be cleared from host VMCS / VMCB, on VM-Exit, they will be silently restored. Should userspace expose the CR pinning CPUID feature bit, it must zero CR pinned MSRs on reboot. If it does not, it runs the risk of having the guest enable pinning and subsequently cause general protection faults on next boot due to early boot code setting control registers to values which do not contain the pinned bits. Userspace is responsible for migrating the contents of the CR* pinned MSRs. If userspace fails to migrate the MSRs the protection will no longer be active. Pinning of sensitive CR bits has already been implemented to protect against exploits directly calling native_write_cr*(). The current protection cannot stop ROP attacks which jump directly to a MOV CR instruction. https://web.archive.org/web/20171029060939/http://www.blackbunny.io/linux-kernel-x86-64-bypass-smep-kaslr-kptr_restric/ Guests running with paravirtualized CR pinning can now be protected against the use of ROP to disable CR bits. The same bits that are being pinned natively may be pinned via the CR pinned MSRs. These bits are WP in CR0, and SMEP, SMAP, and UMIP in CR4. Other hypervisors such as HyperV have implemented similar protections for Control Registers and MSRs; which security researchers have found effective. https://www.abatchy.com/2018/01/kernel-exploitation-4 Future work could implement similar MSRs to protect bits elsewhere, such as MSRs. The NXE bit of the EFER MSR is a prime candidate. Changes for QEMU are required to expose the CR pin cpuid feature bit. As well as clear the MSRs on reboot and enable migration. https://github.com/qemu/qemu/commit/1b26f03653669c97dd8729f9f59be561d68e2b2d https://github.com/qemu/qemu/commit/3af36d57457892c3088ee88de759d4f258c159a7 Signed-off-by: John Andersen --- Documentation/virt/kvm/msr.rst | 53 ++++++++++++++ arch/x86/include/asm/kvm_host.h | 7 ++ arch/x86/include/uapi/asm/kvm_para.h | 7 ++ arch/x86/kvm/cpuid.c | 3 +- arch/x86/kvm/emulate.c | 3 +- arch/x86/kvm/kvm_emulate.h | 2 +- arch/x86/kvm/svm/nested.c | 11 ++- arch/x86/kvm/vmx/nested.c | 10 ++- arch/x86/kvm/x86.c | 106 ++++++++++++++++++++++++++- 9 files changed, 193 insertions(+), 9 deletions(-) diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst index e37a14c323d2..9fa43c4a5895 100644 --- a/Documentation/virt/kvm/msr.rst +++ b/Documentation/virt/kvm/msr.rst @@ -376,3 +376,56 @@ data: write '1' to bit 0 of the MSR, this causes the host to re-scan its queue and check if there are more notifications pending. The MSR is available if KVM_FEATURE_ASYNC_PF_INT is present in CPUID. + +MSR_KVM_CR0_PIN_ALLOWED: + 0x4b564d08 +MSR_KVM_CR4_PIN_ALLOWED: + 0x4b564d09 + + Read only registers informing the guest which bits may be pinned for + each control register respectively via the CR pinned MSRs. + +data: + Bits which may be pinned. + + Attempting to pin bits other than these will result in a failure when + writing to the respective CR pinned MSR. + + Bits which are allowed to be pinned are WP for CR0 and SMEP, SMAP, and + UMIP for CR4. + +MSR_KVM_CR0_PINNED_LOW: + 0x4b564d0a +MSR_KVM_CR0_PINNED_HIGH: + 0x4b564d0b +MSR_KVM_CR4_PINNED_LOW: + 0x4b564d0c +MSR_KVM_CR4_PINNED_HIGH: + 0x4b564d0d + + Used to configure pinned bits in control registers + +data: + Bits to be pinned. + + Fails if data contains bits which are not allowed to be pinned. Or if + attempting to pin bits high that are already pinned low, or vice versa. + Bits which are allowed to be pinned can be found by reading the CR pin + allowed MSRs. + + The MSRs are read/write for host userspace, and write-only for the + guest. + + Once set to a non-zero value, the guest cannot clear any of the bits + that have been pinned. The guest can pin more bits, so long as those + bits appear in the allowed MSR, and are not already pinned to the + opposite value. + + Host userspace may clear or change pinned bits at any point. Host + userspace must clear pinned bits on reboot. + + The MSR enables bit pinning for control registers. Pinning is active + when the guest is not in SMM. If an exit from SMM results in pinned + bits becoming unpinned. The guest will exit. If the guest attempts to + write values to cr* where bits differ from pinned bits, the write will + fail and the guest will be sent a general protection fault. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index f8998e97457f..962cb48535d4 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -565,6 +565,11 @@ struct kvm_vcpu_hv { cpumask_t tlb_flush; }; +struct kvm_vcpu_cr_pinning { + unsigned long high; + unsigned long low; +}; + struct kvm_vcpu_arch { /* * rip and regs accesses must go through @@ -576,10 +581,12 @@ struct kvm_vcpu_arch { unsigned long cr0; unsigned long cr0_guest_owned_bits; + struct kvm_vcpu_cr_pinning cr0_pinned; unsigned long cr2; unsigned long cr3; unsigned long cr4; unsigned long cr4_guest_owned_bits; + struct kvm_vcpu_cr_pinning cr4_pinned; unsigned long cr8; u32 host_pkru; u32 pkru; diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 812e9b4c1114..91241e0d9691 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -32,6 +32,7 @@ #define KVM_FEATURE_POLL_CONTROL 12 #define KVM_FEATURE_PV_SCHED_YIELD 13 #define KVM_FEATURE_ASYNC_PF_INT 14 +#define KVM_FEATURE_CR_PIN 15 #define KVM_HINTS_REALTIME 0 @@ -53,6 +54,12 @@ #define MSR_KVM_POLL_CONTROL 0x4b564d05 #define MSR_KVM_ASYNC_PF_INT 0x4b564d06 #define MSR_KVM_ASYNC_PF_ACK 0x4b564d07 +#define MSR_KVM_CR0_PIN_ALLOWED 0x4b564d08 +#define MSR_KVM_CR4_PIN_ALLOWED 0x4b564d09 +#define MSR_KVM_CR0_PINNED_LOW 0x4b564d0a +#define MSR_KVM_CR0_PINNED_HIGH 0x4b564d0b +#define MSR_KVM_CR4_PINNED_LOW 0x4b564d0c +#define MSR_KVM_CR4_PINNED_HIGH 0x4b564d0d struct kvm_steal_time { __u64 steal; diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 8a294f9747aa..bb0ed645324d 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -716,7 +716,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) (1 << KVM_FEATURE_PV_SEND_IPI) | (1 << KVM_FEATURE_POLL_CONTROL) | (1 << KVM_FEATURE_PV_SCHED_YIELD) | - (1 << KVM_FEATURE_ASYNC_PF_INT); + (1 << KVM_FEATURE_ASYNC_PF_INT) | + (1 << KVM_FEATURE_CR_PIN); if (sched_info_on()) entry->eax |= (1 << KVM_FEATURE_STEAL_TIME); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index d0e2825ae617..95780422765b 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2685,7 +2685,8 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt) return X86EMUL_UNHANDLEABLE; } - ctxt->ops->post_leave_smm(ctxt); + if (ctxt->ops->post_leave_smm(ctxt)) + return X86EMUL_UNHANDLEABLE; return X86EMUL_CONTINUE; } diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h index 43c93ffa76ed..e92dd7605e48 100644 --- a/arch/x86/kvm/kvm_emulate.h +++ b/arch/x86/kvm/kvm_emulate.h @@ -232,7 +232,7 @@ struct x86_emulate_ops { void (*set_hflags)(struct x86_emulate_ctxt *ctxt, unsigned hflags); int (*pre_leave_smm)(struct x86_emulate_ctxt *ctxt, const char *smstate); - void (*post_leave_smm)(struct x86_emulate_ctxt *ctxt); + int (*post_leave_smm)(struct x86_emulate_ctxt *ctxt); int (*set_xcr)(struct x86_emulate_ctxt *ctxt, u32 index, u64 xcr); }; diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index 6bceafb19108..245bdff4b052 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -583,8 +583,15 @@ int nested_svm_vmexit(struct vcpu_svm *svm) svm->vmcb->save.idtr = hsave->save.idtr; kvm_set_rflags(&svm->vcpu, hsave->save.rflags); svm_set_efer(&svm->vcpu, hsave->save.efer); - svm_set_cr0(&svm->vcpu, hsave->save.cr0 | X86_CR0_PE); - svm_set_cr4(&svm->vcpu, hsave->save.cr4); + svm_set_cr0(&svm->vcpu, + (hsave->save.cr0 | + X86_CR0_PE | + svm->vcpu.arch.cr0_pinned.high) & + ~svm->vcpu.arch.cr0_pinned.low); + svm_set_cr4(&svm->vcpu, + (hsave->save.cr4 | + svm->vcpu.arch.cr4_pinned.high) & + ~svm->vcpu.arch.cr4_pinned.low); if (npt_enabled) { svm->vmcb->save.cr3 = hsave->save.cr3; svm->vcpu.arch.cr3 = hsave->save.cr3; diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index adb11b504d5c..a12bac57b374 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -4110,11 +4110,17 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu, * (KVM doesn't change it); */ vcpu->arch.cr0_guest_owned_bits = X86_CR0_TS; - vmx_set_cr0(vcpu, vmcs12->host_cr0); + vmx_set_cr0(vcpu, + (vmcs12->host_cr0 | + vcpu->arch.cr0_pinned.high) & + ~vcpu->arch.cr0_pinned.low); /* Same as above - no reason to call set_cr4_guest_host_mask(). */ vcpu->arch.cr4_guest_owned_bits = ~vmcs_readl(CR4_GUEST_HOST_MASK); - vmx_set_cr4(vcpu, vmcs12->host_cr4); + vmx_set_cr4(vcpu, + (vmcs12->host_cr4 | + vcpu->arch.cr4_pinned.high) & + ~vcpu->arch.cr4_pinned.low); nested_ept_uninit_mmu_context(vcpu); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 00c88c2f34e4..940de9a968bd 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -772,6 +772,9 @@ bool pdptrs_changed(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(pdptrs_changed); +#define KVM_CR0_PIN_ALLOWED (X86_CR0_WP) +#define KVM_CR4_PIN_ALLOWED (X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP) + int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) { unsigned long old_cr0 = kvm_read_cr0(vcpu); @@ -792,6 +795,11 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) if ((cr0 & X86_CR0_PG) && !(cr0 & X86_CR0_PE)) return 1; + if (!is_smm(vcpu) && !is_guest_mode(vcpu) && + (((cr0 ^ vcpu->arch.cr0_pinned.high) & vcpu->arch.cr0_pinned.high) || + ((~cr0 ^ vcpu->arch.cr0_pinned.low) & vcpu->arch.cr0_pinned.low))) + return 1; + if (!is_paging(vcpu) && (cr0 & X86_CR0_PG)) { #ifdef CONFIG_X86_64 if ((vcpu->arch.efer & EFER_LME)) { @@ -972,6 +980,11 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) if (kvm_valid_cr4(vcpu, cr4)) return 1; + if (!is_smm(vcpu) && !is_guest_mode(vcpu) && + (((cr4 ^ vcpu->arch.cr4_pinned.high) & vcpu->arch.cr4_pinned.high) || + ((~cr4 ^ vcpu->arch.cr4_pinned.low) & vcpu->arch.cr4_pinned.low))) + return 1; + if (is_long_mode(vcpu)) { if (!(cr4 & X86_CR4_PAE)) return 1; @@ -1291,6 +1304,12 @@ static const u32 emulated_msrs_all[] = { MSR_K7_HWCR, MSR_KVM_POLL_CONTROL, + MSR_KVM_CR0_PIN_ALLOWED, + MSR_KVM_CR4_PIN_ALLOWED, + MSR_KVM_CR0_PINNED_LOW, + MSR_KVM_CR0_PINNED_HIGH, + MSR_KVM_CR4_PINNED_LOW, + MSR_KVM_CR4_PINNED_HIGH, }; static u32 emulated_msrs[ARRAY_SIZE(emulated_msrs_all)]; @@ -2986,6 +3005,54 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) vcpu->arch.msr_kvm_poll_control = data; break; + case MSR_KVM_CR0_PIN_ALLOWED: + return (data != KVM_CR0_PIN_ALLOWED); + case MSR_KVM_CR4_PIN_ALLOWED: + return (data != KVM_CR4_PIN_ALLOWED); + case MSR_KVM_CR0_PINNED_LOW: + if ((data & ~KVM_CR0_PIN_ALLOWED) || + (data & vcpu->arch.cr0_pinned.high)) + return 1; + + if (!msr_info->host_initiated && + (~data & vcpu->arch.cr0_pinned.low)) + return 1; + + vcpu->arch.cr0_pinned.low = data; + break; + case MSR_KVM_CR0_PINNED_HIGH: + if ((data & ~KVM_CR0_PIN_ALLOWED) || + (data & vcpu->arch.cr0_pinned.low)) + return 1; + + if (!msr_info->host_initiated && + (~data & vcpu->arch.cr0_pinned.high)) + return 1; + + vcpu->arch.cr0_pinned.high = data; + break; + case MSR_KVM_CR4_PINNED_LOW: + if ((data & ~KVM_CR4_PIN_ALLOWED) || + (data & vcpu->arch.cr4_pinned.high)) + return 1; + + if (!msr_info->host_initiated && + (~data & vcpu->arch.cr4_pinned.low)) + return 1; + + vcpu->arch.cr4_pinned.low = data; + break; + case MSR_KVM_CR4_PINNED_HIGH: + if ((data & ~KVM_CR4_PIN_ALLOWED) || + (data & vcpu->arch.cr4_pinned.low)) + return 1; + + if (!msr_info->host_initiated && + (~data & vcpu->arch.cr4_pinned.high)) + return 1; + + vcpu->arch.cr4_pinned.high = data; + break; case MSR_IA32_MCG_CTL: case MSR_IA32_MCG_STATUS: case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1: @@ -3250,6 +3317,24 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_KVM_POLL_CONTROL: msr_info->data = vcpu->arch.msr_kvm_poll_control; break; + case MSR_KVM_CR0_PIN_ALLOWED: + msr_info->data = KVM_CR0_PIN_ALLOWED; + break; + case MSR_KVM_CR4_PIN_ALLOWED: + msr_info->data = KVM_CR4_PIN_ALLOWED; + break; + case MSR_KVM_CR0_PINNED_LOW: + msr_info->data = vcpu->arch.cr0_pinned.low; + break; + case MSR_KVM_CR0_PINNED_HIGH: + msr_info->data = vcpu->arch.cr0_pinned.high; + break; + case MSR_KVM_CR4_PINNED_LOW: + msr_info->data = vcpu->arch.cr4_pinned.low; + break; + case MSR_KVM_CR4_PINNED_HIGH: + msr_info->data = vcpu->arch.cr4_pinned.high; + break; case MSR_IA32_P5_MC_ADDR: case MSR_IA32_P5_MC_TYPE: case MSR_IA32_MCG_CAP: @@ -6414,9 +6499,23 @@ static int emulator_pre_leave_smm(struct x86_emulate_ctxt *ctxt, return kvm_x86_ops.pre_leave_smm(emul_to_vcpu(ctxt), smstate); } -static void emulator_post_leave_smm(struct x86_emulate_ctxt *ctxt) +static int emulator_post_leave_smm(struct x86_emulate_ctxt *ctxt) { - kvm_smm_changed(emul_to_vcpu(ctxt)); + struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt); + unsigned long cr0 = kvm_read_cr0(vcpu); + unsigned long cr4 = kvm_read_cr4(vcpu); + + if (((cr0 ^ vcpu->arch.cr0_pinned.high) & vcpu->arch.cr0_pinned.high) || + ((~cr0 ^ vcpu->arch.cr0_pinned.low) & vcpu->arch.cr0_pinned.low)) + return 1; + + if (((cr4 ^ vcpu->arch.cr4_pinned.high) & vcpu->arch.cr4_pinned.high) || + ((~cr4 ^ vcpu->arch.cr4_pinned.low) & vcpu->arch.cr4_pinned.low)) + return 1; + + kvm_smm_changed(vcpu); + + return 0; } static int emulator_set_xcr(struct x86_emulate_ctxt *ctxt, u32 index, u64 xcr) @@ -9640,6 +9739,9 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) vcpu->arch.ia32_xss = 0; + memset(&vcpu->arch.cr0_pinned, 0, sizeof(vcpu->arch.cr0_pinned)); + memset(&vcpu->arch.cr4_pinned, 0, sizeof(vcpu->arch.cr4_pinned)); + kvm_x86_ops.vcpu_reset(vcpu, init_event); } From patchwork Wed Jun 17 19:07:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Andersen, John" X-Patchwork-Id: 11610459 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 472C613A0 for ; Wed, 17 Jun 2020 19:05:45 +0000 (UTC) Received: from mother.openwall.net (mother.openwall.net [195.42.179.200]) by mail.kernel.org (Postfix) with SMTP id 828E5212CC for ; Wed, 17 Jun 2020 19:05:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 828E5212CC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kernel-hardening-return-19002-patchwork-kernel-hardening=patchwork.kernel.org@lists.openwall.com Received: (qmail 8160 invoked by uid 550); 17 Jun 2020 19:05:23 -0000 Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Delivered-To: mailing list kernel-hardening@lists.openwall.com Received: (qmail 8004 invoked from network); 17 Jun 2020 19:05:22 -0000 IronPort-SDR: JNbOQLHW8wupbM6ziw6D+Pm/BrWckCj8ifhrFA/GjD3rUSA7OCg3DjQoTFvnB6Esgxp8aKOGBA baT8TVyqthaQ== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False IronPort-SDR: vJEpKL+AIa7KhEzPupD3WbOKuBWu+CYvk8p+IiV4ZdOZMRARIttvSmd3m0FmydNAtiba+osmr/ hjQT5dpQEtyg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,523,1583222400"; d="scan'208";a="273609661" From: John Andersen To: corbet@lwn.net, pbonzini@redhat.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, shuah@kernel.org, sean.j.christopherson@intel.com, liran.alon@oracle.com, drjones@redhat.com, rick.p.edgecombe@intel.com, kristen@linux.intel.com Cc: vkuznets@redhat.com, wanpengli@tencent.com, jmattson@google.com, joro@8bytes.org, mchehab+huawei@kernel.org, gregkh@linuxfoundation.org, paulmck@kernel.org, pawan.kumar.gupta@linux.intel.com, jgross@suse.com, mike.kravetz@oracle.com, oneukum@suse.com, luto@kernel.org, peterz@infradead.org, fenghua.yu@intel.com, reinette.chatre@intel.com, vineela.tummalapalli@intel.com, dave.hansen@linux.intel.com, john.s.andersen@intel.com, arjan@linux.intel.com, caoj.fnst@cn.fujitsu.com, bhe@redhat.com, nivedita@alum.mit.edu, keescook@chromium.org, dan.j.williams@intel.com, eric.auger@redhat.com, aaronlewis@google.com, peterx@redhat.com, makarandsonare@google.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, kernel-hardening@lists.openwall.com Subject: [PATCH 3/4] selftests: kvm: add test for CR pinning with SMM Date: Wed, 17 Jun 2020 12:07:56 -0700 Message-Id: <20200617190757.27081-4-john.s.andersen@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200617190757.27081-1-john.s.andersen@intel.com> References: <20200617190757.27081-1-john.s.andersen@intel.com> MIME-Version: 1.0 Check that paravirtualized control register pinning blocks modifications of pinned CR values stored in SMRAM on exit from SMM. Signed-off-by: John Andersen --- tools/testing/selftests/kvm/.gitignore | 1 + tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/include/x86_64/processor.h | 13 ++ .../selftests/kvm/x86_64/smm_cr_pin_test.c | 207 ++++++++++++++++++ 4 files changed, 222 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86_64/smm_cr_pin_test.c diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore index 452787152748..c0666c3efcbb 100644 --- a/tools/testing/selftests/kvm/.gitignore +++ b/tools/testing/selftests/kvm/.gitignore @@ -10,6 +10,7 @@ /x86_64/platform_info_test /x86_64/set_sregs_test /x86_64/smm_test +/x86_64/smm_cr_pin_test /x86_64/state_test /x86_64/vmx_preemption_timer_test /x86_64/svm_vmcall_test diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 4a166588d99f..c5c205637f38 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -45,6 +45,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/mmio_warning_test TEST_GEN_PROGS_x86_64 += x86_64/platform_info_test TEST_GEN_PROGS_x86_64 += x86_64/set_sregs_test TEST_GEN_PROGS_x86_64 += x86_64/smm_test +TEST_GEN_PROGS_x86_64 += x86_64/smm_cr_pin_test TEST_GEN_PROGS_x86_64 += x86_64/state_test TEST_GEN_PROGS_x86_64 += x86_64/vmx_preemption_timer_test TEST_GEN_PROGS_x86_64 += x86_64/svm_vmcall_test diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h index 82b7fe16a824..8a2da0449772 100644 --- a/tools/testing/selftests/kvm/include/x86_64/processor.h +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h @@ -200,6 +200,11 @@ static inline uint64_t get_cr0(void) return cr0; } +static inline void set_cr0(uint64_t val) +{ + __asm__ __volatile__("mov %0, %%cr0" : : "r" (val) : "memory"); +} + static inline uint64_t get_cr3(void) { uint64_t cr3; @@ -383,4 +388,12 @@ void kvm_get_cpu_address_width(unsigned int *pa_bits, unsigned int *va_bits); /* VMX_EPT_VPID_CAP bits */ #define VMX_EPT_VPID_CAP_AD_BITS (1ULL << 21) +/* KVM MSRs */ +#define MSR_KVM_CR0_PIN_ALLOWED 0x4b564d08 +#define MSR_KVM_CR4_PIN_ALLOWED 0x4b564d09 +#define MSR_KVM_CR0_PINNED_LOW 0x4b564d0a +#define MSR_KVM_CR0_PINNED_HIGH 0x4b564d0b +#define MSR_KVM_CR4_PINNED_LOW 0x4b564d0c +#define MSR_KVM_CR4_PINNED_HIGH 0x4b564d0d + #endif /* SELFTEST_KVM_PROCESSOR_H */ diff --git a/tools/testing/selftests/kvm/x86_64/smm_cr_pin_test.c b/tools/testing/selftests/kvm/x86_64/smm_cr_pin_test.c new file mode 100644 index 000000000000..a32f577ca1e5 --- /dev/null +++ b/tools/testing/selftests/kvm/x86_64/smm_cr_pin_test.c @@ -0,0 +1,207 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Tests for control register pinning not being affected by SMRAM writes. + */ +#define _GNU_SOURCE /* for program_invocation_short_name */ +#include +#include +#include +#include +#include +#include +#include + +#include "test_util.h" + +#include "kvm_util.h" + +#include "processor.h" + +#define VCPU_ID 1 + +#define PAGE_SIZE 4096 + +#define SMRAM_SIZE 65536 +#define SMRAM_MEMSLOT ((1 << 16) | 1) +#define SMRAM_PAGES (SMRAM_SIZE / PAGE_SIZE) +#define SMRAM_GPA 0x1000000 +#define SMRAM_STAGE_SUCCESS 0xfe +#define SMRAM_STAGE_FAILURE 0xfd + +#define XSTR(s) __stringify(s) + +#define SYNC_PORT 0xe +#define DONE 0xff + +#define CR0_PINNED X86_CR0_WP +#define CR4_PINNED (X86_CR4_SMAP | X86_CR4_UMIP) +#define CR4_ALL (CR4_PINNED | X86_CR4_SMEP) + +/* + * This is compiled as normal 64-bit code, however, SMI handler is executed + * in real-address mode. To stay simple we're limiting ourselves to a mode + * independent subset of asm here. + * SMI handler always report back fixed stage SMRAM_STAGE_SUCCESS. + */ +uint8_t smi_handler_success[] = { + 0xb0, SMRAM_STAGE_SUCCESS, /* mov $SMRAM_STAGE_SUCCESS, %al */ + 0xe4, SYNC_PORT, /* in $SYNC_PORT, %al */ + 0x0f, 0xaa, /* rsm */ +}; +uint8_t smi_handler_fault[] = { + 0xb0, SMRAM_STAGE_FAILURE, /* mov $SMRAM_STAGE_FAILURE, %al */ + 0xe4, SYNC_PORT, /* in $SYNC_PORT, %al */ + 0x0f, 0xaa, /* rsm */ +}; + +/* We opt not to use GUEST_SYNC() here because we also have to make a sync call + * from SMM. As such, the address of the ucall struct we'd need to pass isn't + * something we can put into the above machine code in a maintainable way + */ +static inline void sync_with_host(uint64_t phase) +{ + asm volatile("in $" XSTR(SYNC_PORT)", %%al\n" + : "+a" (phase)); +} + +void self_smi(void) +{ + wrmsr(APIC_BASE_MSR + (APIC_ICR >> 4), + APIC_DEST_SELF | APIC_INT_ASSERT | APIC_DM_SMI); +} + +void guest_code(void) +{ + uint64_t apicbase = rdmsr(MSR_IA32_APICBASE); + + sync_with_host(1); + + wrmsr(MSR_IA32_APICBASE, apicbase | X2APIC_ENABLE); + + sync_with_host(2); + + set_cr0(get_cr0() | CR0_PINNED); + + wrmsr(MSR_KVM_CR0_PINNED_HIGH, CR0_PINNED); + + sync_with_host(3); + + set_cr4(get_cr4() | CR4_PINNED); + + sync_with_host(4); + + /* Pin SMEP low */ + wrmsr(MSR_KVM_CR4_PINNED_HIGH, CR4_PINNED); + + sync_with_host(5); + + self_smi(); + + sync_with_host(DONE); +} + +int main(int argc, char *argv[]) +{ + struct kvm_regs regs; + struct kvm_sregs sregs; + struct kvm_vm *vm; + struct kvm_run *run; + struct kvm_x86_state *state; + int failure, stage, stage_reported; + u64 *cr; + + for (failure = 0; failure <= 1; failure++) { + stage_reported = 0; + + /* Create VM */ + vm = vm_create_default(VCPU_ID, 0, guest_code); + + vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid()); + + run = vcpu_state(vm, VCPU_ID); + + vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, SMRAM_GPA, + SMRAM_MEMSLOT, SMRAM_PAGES, 0); + TEST_ASSERT(vm_phy_pages_alloc(vm, SMRAM_PAGES, SMRAM_GPA, SMRAM_MEMSLOT) + == SMRAM_GPA, "could not allocate guest physical addresses?"); + + memset(addr_gpa2hva(vm, SMRAM_GPA), 0x0, SMRAM_SIZE); + if (failure) { + memcpy(addr_gpa2hva(vm, SMRAM_GPA) + 0x8000, smi_handler_fault, + sizeof(smi_handler_fault)); + } else { + memcpy(addr_gpa2hva(vm, SMRAM_GPA) + 0x8000, smi_handler_success, + sizeof(smi_handler_success)); + } + vcpu_set_msr(vm, VCPU_ID, MSR_IA32_SMBASE, SMRAM_GPA); + + for (stage = 1;; stage++) { + _vcpu_run(vm, VCPU_ID); + + memset(®s, 0, sizeof(regs)); + vcpu_regs_get(vm, VCPU_ID, ®s); + + memset(&sregs, 0, sizeof(sregs)); + vcpu_sregs_get(vm, VCPU_ID, &sregs); + + /* stage_reported is currrently the last stage reported */ + if (failure && stage_reported == SMRAM_STAGE_FAILURE) { + /* Ensure that we exit on smram modification of CR4 */ + TEST_ASSERT(run->exit_reason == KVM_EXIT_INTERNAL_ERROR && + run->internal.suberror == KVM_INTERNAL_ERROR_EMULATION, + "Stage %d: unexpected exit reason: %u, suberror: %u (%s),\n", + stage, run->exit_reason, run->internal.suberror, + exit_reason_str(run->exit_reason)); + if (run->exit_reason == KVM_EXIT_INTERNAL_ERROR) + goto done; + } else { + TEST_ASSERT(run->exit_reason == KVM_EXIT_IO, + "Stage %d: unexpected exit reason: %u (%s),\n", + stage, run->exit_reason, + exit_reason_str(run->exit_reason)); + } + + stage_reported = regs.rax & 0xff; + + if (stage_reported == DONE) { + TEST_ASSERT((sregs.cr0 & CR0_PINNED) == CR0_PINNED, + "Unexpected cr0. Bits missing: %llx", + sregs.cr0 ^ (CR0_PINNED | sregs.cr0)); + TEST_ASSERT((sregs.cr4 & CR4_ALL) == CR4_PINNED, + "Unexpected cr4. Bits missing: %llx, cr4: %llx", + sregs.cr4 ^ (CR4_ALL | sregs.cr4), + sregs.cr4); + goto done; + } + + TEST_ASSERT(stage_reported == stage || + stage_reported == SMRAM_STAGE_SUCCESS || + stage_reported == SMRAM_STAGE_FAILURE, + "Unexpected stage: #%x, got %x", + stage, stage_reported); + + /* Within SMM modify CR0/4 to not contain pinned bits. */ + if (stage_reported == SMRAM_STAGE_FAILURE) { + cr = (u64 *)(addr_gpa2hva(vm, SMRAM_GPA + 0x8000 + 0x7f58)); + *cr &= ~CR0_PINNED; + + cr = (u64 *)(addr_gpa2hva(vm, SMRAM_GPA + 0x8000 + 0x7f48)); + /* Unset pinned, set one that was pinned low */ + *cr &= ~CR4_PINNED; + *cr |= X86_CR4_SMEP; + } + + state = vcpu_save_state(vm, VCPU_ID); + kvm_vm_release(vm); + kvm_vm_restart(vm, O_RDWR); + vm_vcpu_add(vm, VCPU_ID); + vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid()); + vcpu_load_state(vm, VCPU_ID, state); + run = vcpu_state(vm, VCPU_ID); + free(state); + } + +done: + kvm_vm_free(vm); + } +} From patchwork Wed Jun 17 19:07:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Andersen, John" X-Patchwork-Id: 11610461 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E337C1731 for ; Wed, 17 Jun 2020 19:05:54 +0000 (UTC) Received: from mother.openwall.net (mother.openwall.net [195.42.179.200]) by mail.kernel.org (Postfix) with SMTP id 29D7B2088E for ; Wed, 17 Jun 2020 19:05:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 29D7B2088E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kernel-hardening-return-19003-patchwork-kernel-hardening=patchwork.kernel.org@lists.openwall.com Received: (qmail 9319 invoked by uid 550); 17 Jun 2020 19:05:25 -0000 Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Delivered-To: mailing list kernel-hardening@lists.openwall.com Received: (qmail 8167 invoked from network); 17 Jun 2020 19:05:24 -0000 IronPort-SDR: T1UOyoEb5Y/SY0h7uuwT0vXI/bXIl1aeL8vEJrmThXEuRc0Mqa7ynfYCwbRfmq3x1OOPzOaKAv tfG55o0YozNA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False IronPort-SDR: 16liMDp6fjMJskgBNoaSNQ/ghF+MJxz6rqa/Oic4ytifJV5tNpXBCV6ChubOykl3ZRrsk1hjFv 8O6IvGieAe+Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,523,1583222400"; d="scan'208";a="273609675" From: John Andersen To: corbet@lwn.net, pbonzini@redhat.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, shuah@kernel.org, sean.j.christopherson@intel.com, liran.alon@oracle.com, drjones@redhat.com, rick.p.edgecombe@intel.com, kristen@linux.intel.com Cc: vkuznets@redhat.com, wanpengli@tencent.com, jmattson@google.com, joro@8bytes.org, mchehab+huawei@kernel.org, gregkh@linuxfoundation.org, paulmck@kernel.org, pawan.kumar.gupta@linux.intel.com, jgross@suse.com, mike.kravetz@oracle.com, oneukum@suse.com, luto@kernel.org, peterz@infradead.org, fenghua.yu@intel.com, reinette.chatre@intel.com, vineela.tummalapalli@intel.com, dave.hansen@linux.intel.com, john.s.andersen@intel.com, arjan@linux.intel.com, caoj.fnst@cn.fujitsu.com, bhe@redhat.com, nivedita@alum.mit.edu, keescook@chromium.org, dan.j.williams@intel.com, eric.auger@redhat.com, aaronlewis@google.com, peterx@redhat.com, makarandsonare@google.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, kernel-hardening@lists.openwall.com Subject: [PATCH 4/4] X86: Use KVM CR pin MSRs Date: Wed, 17 Jun 2020 12:07:57 -0700 Message-Id: <20200617190757.27081-5-john.s.andersen@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200617190757.27081-1-john.s.andersen@intel.com> References: <20200617190757.27081-1-john.s.andersen@intel.com> MIME-Version: 1.0 Strengthen existing control register pinning when running paravirtualized under KVM. Check which bits KVM supports pinning for each control register and only pin supported bits which are already pinned via the existing native protection. Write to KVM CR0/4 pinned MSRs to enable pinning. Initiate KVM assisted pinning directly following the setup of native pinning on boot CPU. For non-boot CPUs initiate paravirtualized pinning on CPU identification. Identification of non-boot CPUs takes place after the boot CPU has setup native CR pinning. Therefore, non-boot CPUs access pinned bits setup by the boot CPU and request that those be pinned. All CPUs request paravirtualized pinning of the same bits which are already pinned natively. Guests using the kexec system call currently do not support paravirtualized control register pinning. This is due to early boot code writing known good values to control registers, these values do not contain the protected bits. This is due to CPU feature identification being done at a later time, when the kernel properly checks if it can enable protections. As such, the pv_cr_pin command line option has been added which instructs the kernel to disable kexec in favor of enabling paravirtualized control register pinning. crashkernel is also disabled when the pv_cr_pin parameter is specified due to its reliance on kexec. When we fix kexec, we will still need a way for a kernel with support to know if the kernel it is attempting to load has support. If a kernel with this enabled attempts to kexec a kernel where this is not supported, it would trigger a fault almost immediately. Liran suggested adding a section to the built image acting as a flag to signify support for being kexec'd by a kernel with pinning enabled. Should that approach be implemented, it is likely that the command line flag (pv_cr_pin) would still be desired for some deprecation period. We wouldn't want the default behavior to change from being able to kexec older kernels to not being able to, as this might break some users workflows. Signed-off-by: John Andersen --- .../admin-guide/kernel-parameters.txt | 11 ++++++ arch/x86/Kconfig | 10 +++++ arch/x86/include/asm/kvm_para.h | 28 +++++++++++++ arch/x86/kernel/cpu/common.c | 5 +++ arch/x86/kernel/kvm.c | 39 +++++++++++++++++++ arch/x86/kernel/setup.c | 8 ++++ 6 files changed, 101 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 89386f6f3ab6..54fb2b5ab8fc 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3926,6 +3926,17 @@ [KNL] Number of legacy pty's. Overwrites compiled-in default number. + pv_cr_pin [SECURITY,X86] + Enable paravirtualized control register pinning. When + running paravirutalized under KVM, request that KVM not + allow the guest to disable kernel protection features + set in CPU control registers. Specifying this option + will disable kexec (and crashkernel). If kexec support + has not been compiled into the kernel and host KVM + supports paravirtualized control register pinning, it + will be active by default without the need to specify + this parameter. + quiet [KNL] Disable most log messages r128= [HW,DRM] diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 67f6a40b5e93..bc0b27483001 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -800,6 +800,7 @@ config KVM_GUEST bool "KVM Guest support (including kvmclock)" depends on PARAVIRT select PARAVIRT_CLOCK + select PARAVIRT_CR_PIN select ARCH_CPUIDLE_HALTPOLL default y ---help--- @@ -835,6 +836,15 @@ config PARAVIRT_TIME_ACCOUNTING config PARAVIRT_CLOCK bool +config PARAVIRT_CR_PIN + bool "Paravirtual bit pinning for CR0 and CR4" + depends on KVM_GUEST + help + Select this option to have the virtualised guest request that the + hypervisor disallow it from disabling protections set in control + registers. The hypervisor will prevent exploits from disabling + features such as SMEP, SMAP, UMIP, and WP. + config JAILHOUSE_GUEST bool "Jailhouse non-root cell support" depends on X86_64 && PCI diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 57fd1966c4ea..f021531e98dc 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -112,6 +112,23 @@ static inline void kvm_spinlock_init(void) } #endif /* CONFIG_PARAVIRT_SPINLOCKS */ +#ifdef CONFIG_PARAVIRT_CR_PIN +void __init kvm_paravirt_cr_pinning_init(void); +void kvm_setup_paravirt_cr_pinning(unsigned long cr0_pinned_bits, + unsigned long cr4_pinned_bits); +#else +static inline void kvm_paravirt_cr_pinning_init(void) +{ + return; +} + +static inline void kvm_setup_paravirt_cr_pinning(unsigned long cr0_pinned_bits, + unsigned long cr4_pinned_bits) +{ + return; +} +#endif /* CONFIG_PARAVIRT_CR_PIN */ + #else /* CONFIG_KVM_GUEST */ #define kvm_async_pf_task_wait_schedule(T) do {} while(0) #define kvm_async_pf_task_wake(T) do {} while(0) @@ -145,6 +162,17 @@ static inline bool kvm_handle_async_pf(struct pt_regs *regs, u32 token) { return false; } + +static inline void kvm_paravirt_cr_pinning_init(void) +{ + return; +} + +static inline void kvm_setup_paravirt_cr_pinning(unsigned long cr0_pinned_bits, + unsigned long cr4_pinned_bits) +{ + return; +} #endif #endif /* _ASM_X86_KVM_PARA_H */ diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 921e67086a00..ee17223b1fa8 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include @@ -416,6 +417,8 @@ static void __init setup_cr_pinning(void) mask = (X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP); cr4_pinned_bits = this_cpu_read(cpu_tlbstate.cr4) & mask; static_key_enable(&cr_pinning.key); + + kvm_setup_paravirt_cr_pinning(X86_CR0_WP, cr4_pinned_bits); } /* @@ -1551,6 +1554,8 @@ void identify_secondary_cpu(struct cpuinfo_x86 *c) mtrr_ap_init(); validate_apic_and_package_id(c); x86_spec_ctrl_setup_ap(); + + kvm_setup_paravirt_cr_pinning(X86_CR0_WP, cr4_pinned_bits); } static __init int setup_noclflush(char *arg) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 7e6403a8d861..def913b86a99 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -23,6 +23,8 @@ #include #include #include +#include +#include #include #include #include @@ -33,6 +35,7 @@ #include #include #include +#include DEFINE_STATIC_KEY_FALSE(kvm_async_pf_enabled); @@ -723,6 +726,7 @@ static void __init kvm_apic_init(void) static void __init kvm_init_platform(void) { kvmclock_init(); + kvm_paravirt_cr_pinning_init(); x86_platform.apic_post_init = kvm_apic_init; } @@ -877,6 +881,41 @@ void __init kvm_spinlock_init(void) #endif /* CONFIG_PARAVIRT_SPINLOCKS */ +#ifdef CONFIG_PARAVIRT_CR_PIN +static int kvm_paravirt_cr_pinning_enabled __ro_after_init; + +void __init kvm_paravirt_cr_pinning_init(void) +{ +#ifdef CONFIG_KEXEC_CORE + if (!cmdline_find_option_bool(boot_command_line, "pv_cr_pin")) + return; + + /* Paravirtualized CR pinning is currently incompatible with kexec */ + kexec_load_disabled = 1; +#endif + + kvm_paravirt_cr_pinning_enabled = 1; +} + +void kvm_setup_paravirt_cr_pinning(unsigned long cr0_pinned_bits, + unsigned long cr4_pinned_bits) +{ + u64 mask; + + if (!kvm_paravirt_cr_pinning_enabled) + return; + + if (!kvm_para_has_feature(KVM_FEATURE_CR_PIN)) + return; + + rdmsrl(MSR_KVM_CR0_PIN_ALLOWED, mask); + wrmsrl(MSR_KVM_CR0_PINNED_HIGH, cr0_pinned_bits & mask); + + rdmsrl(MSR_KVM_CR4_PIN_ALLOWED, mask); + wrmsrl(MSR_KVM_CR4_PINNED_HIGH, cr4_pinned_bits & mask); +} +#endif + #ifdef CONFIG_ARCH_CPUIDLE_HALTPOLL static void kvm_disable_host_haltpoll(void *i) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index d9c678b37a9b..ed3bcc85d40d 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -27,6 +27,9 @@ #include #include #include +#include +#include + #include #include #include @@ -502,6 +505,11 @@ static void __init reserve_crashkernel(void) return; } + if (cmdline_find_option_bool(boot_command_line, "pv_cr_pin")) { + pr_info("Ignoring crashkernel since pv_cr_pin present in cmdline\n"); + return; + } + /* 0 means: find the address automatically */ if (!crash_base) { /*