From patchwork Mon Jan 18 03:28:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kai Huang X-Patchwork-Id: 12026397 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C6A7C433DB for ; Mon, 18 Jan 2021 03:30:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1D365224B1 for ; Mon, 18 Jan 2021 03:30:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731824AbhARDag (ORCPT ); Sun, 17 Jan 2021 22:30:36 -0500 Received: from mga12.intel.com ([192.55.52.136]:16942 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731713AbhARDaO (ORCPT ); Sun, 17 Jan 2021 22:30:14 -0500 IronPort-SDR: ASXGHcECnxAAVl4NLDShsfe2RL/nxT2a0eC+AkAn7fbSDm/oZ91JNGflMuhVZ7ScSKd801x7Jq vnMaAUmvOpow== X-IronPort-AV: E=McAfee;i="6000,8403,9867"; a="157933927" X-IronPort-AV: E=Sophos;i="5.79,355,1602572400"; d="scan'208";a="157933927" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jan 2021 19:29:21 -0800 IronPort-SDR: ucX4PDD9Kwj+YEgE6gLArp5l4eOKegnT89+DTkTMFxmw8HaCUV1e0bCZ7RlgeFvyUU09i3GZoK 6enk0rGrcmqA== X-IronPort-AV: E=Sophos;i="5.79,355,1602572400"; d="scan'208";a="573151173" Received: from amrahman-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.252.142.253]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jan 2021 19:29:16 -0800 From: Kai Huang To: linux-sgx@vger.kernel.org, kvm@vger.kernel.org, x86@kernel.org Cc: seanjc@google.com, jarkko@kernel.org, luto@kernel.org, dave.hansen@intel.com, haitao.huang@intel.com, pbonzini@redhat.com, bp@alien8.de, tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com, jmattson@google.com, joro@8bytes.org, vkuznets@redhat.com, wanpengli@tencent.com, Kai Huang Subject: [RFC PATCH v2 22/26] KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions Date: Mon, 18 Jan 2021 16:28:33 +1300 Message-Id: <8b22ca5a13a9e1dbdfedc0ae26bf7c396cc5f8c5.1610935432.git.kai.huang@intel.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sean Christopherson Add an ECREATE handler that will be used to intercept ECREATE for the purpose of enforcing and enclave's MISCSELECT, ATTRIBUTES and XFRM, i.e. to allow userspace to restrict SGX features via CPUID. ECREATE will be intercepted when any of the aforementioned masks diverges from hardware in order to enforce the desired CPUID model, i.e. inject #GP if the guest attempts to set a bit that hasn't been enumerated as allowed-1 in CPUID. Note, access to the PROVISIONKEY is not yet supported. Signed-off-by: Sean Christopherson Signed-off-by: Kai Huang --- arch/x86/include/asm/kvm_host.h | 3 + arch/x86/kvm/vmx/sgx.c | 243 ++++++++++++++++++++++++++++++++ 2 files changed, 246 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 9581f81e62a4..cd71f30fbdd1 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1000,6 +1000,9 @@ struct kvm_arch { struct msr_bitmap_range ranges[16]; } msr_filter; + /* Guest can access the SGX PROVISIONKEY. */ + bool sgx_provisioning_allowed; + struct kvm_pmu_event_filter *pmu_event_filter; struct task_struct *nx_lpage_recovery_thread; diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c index 693bf7735308..4281045318ac 100644 --- a/arch/x86/kvm/vmx/sgx.c +++ b/arch/x86/kvm/vmx/sgx.c @@ -12,6 +12,247 @@ bool __read_mostly enable_sgx; +/* + * ENCLS's memory operands use a fixed segment (DS) and a fixed + * address size based on the mode. Related prefixes are ignored. + */ +static int sgx_get_encls_gva(struct kvm_vcpu *vcpu, unsigned long offset, + int size, int alignment, gva_t *gva) +{ + struct kvm_segment s; + bool fault; + + /* Skip vmcs.GUEST_DS retrieval for 64-bit mode to avoid VMREADs. */ + *gva = offset; + if (!is_long_mode(vcpu)) { + vmx_get_segment(vcpu, &s, VCPU_SREG_DS); + *gva += s.base; + } + + if (!IS_ALIGNED(*gva, alignment)) { + fault = true; + } else if (likely(is_long_mode(vcpu))) { + fault = is_noncanonical_address(*gva, vcpu); + } else { + *gva &= 0xffffffff; + fault = (s.unusable) || + (s.type != 2 && s.type != 3) || + (*gva > s.limit) || + ((s.base != 0 || s.limit != 0xffffffff) && + (((u64)*gva + size - 1) > s.limit + 1)); + } + if (fault) + kvm_inject_gp(vcpu, 0); + return fault ? -EINVAL : 0; +} + +static void sgx_handle_emulation_failure(struct kvm_vcpu *vcpu, u64 addr, + unsigned int size) +{ + vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; + vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION; + vcpu->run->internal.ndata = 2; + vcpu->run->internal.data[0] = addr; + vcpu->run->internal.data[1] = size; +} + +static int sgx_read_hva(struct kvm_vcpu *vcpu, unsigned long hva, void *data, + unsigned int size) +{ + if (__copy_from_user(data, (void __user *)hva, size)) { + sgx_handle_emulation_failure(vcpu, hva, size); + return -EFAULT; + } + + return 0; +} + +static int sgx_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t gva, bool write, + gpa_t *gpa) +{ + struct x86_exception ex; + + if (write) + *gpa = kvm_mmu_gva_to_gpa_write(vcpu, gva, &ex); + else + *gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, &ex); + + if (*gpa == UNMAPPED_GVA) { + kvm_inject_emulated_page_fault(vcpu, &ex); + return -EFAULT; + } + + return 0; +} + +static int sgx_gpa_to_hva(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long *hva) +{ + *hva = kvm_vcpu_gfn_to_hva(vcpu, PFN_DOWN(gpa)); + if (kvm_is_error_hva(*hva)) { + sgx_handle_emulation_failure(vcpu, gpa, 1); + return -EFAULT; + } + + *hva |= gpa & ~PAGE_MASK; + + return 0; +} + +static int sgx_inject_fault(struct kvm_vcpu *vcpu, gva_t gva, int trapnr) +{ + struct x86_exception ex; + + /* + * A non-EPCM #PF indicates a bad userspace HVA. This *should* check + * for PFEC.SGX and not assume any #PF on SGX2 originated in the EPC, + * but the error code isn't (yet) plumbed through the ENCLS helpers. + */ + if (trapnr == PF_VECTOR && !boot_cpu_has(X86_FEATURE_SGX2)) { + vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; + vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION; + vcpu->run->internal.ndata = 0; + return 0; + } + + /* + * If the guest thinks it's running on SGX2 hardware, inject an SGX + * #PF if the fault matches an EPCM fault signature (#GP on SGX1, + * #PF on SGX2). The assumption is that EPCM faults are much more + * likely than a bad userspace address. + */ + if ((trapnr == PF_VECTOR || !boot_cpu_has(X86_FEATURE_SGX2)) && + guest_cpuid_has(vcpu, X86_FEATURE_SGX2)) { + memset(&ex, 0, sizeof(ex)); + ex.vector = PF_VECTOR; + ex.error_code = PFERR_PRESENT_MASK | PFERR_WRITE_MASK | + PFERR_SGX_MASK; + ex.address = gva; + ex.error_code_valid = true; + ex.nested_page_fault = false; + kvm_inject_page_fault(vcpu, &ex); + } else { + kvm_inject_gp(vcpu, 0); + } + return 1; +} + +static int handle_encls_ecreate(struct kvm_vcpu *vcpu) +{ + unsigned long a_hva, m_hva, x_hva, s_hva, secs_hva; + struct kvm_cpuid_entry2 *sgx_12_0, *sgx_12_1; + gpa_t metadata_gpa, contents_gpa, secs_gpa; + struct sgx_pageinfo pageinfo; + gva_t pageinfo_gva, secs_gva; + u64 attributes, xfrm, size; + struct x86_exception ex; + u8 max_size_log2; + u32 miscselect; + int trapnr, r; + + sgx_12_0 = kvm_find_cpuid_entry(vcpu, 0x12, 0); + sgx_12_1 = kvm_find_cpuid_entry(vcpu, 0x12, 1); + if (!sgx_12_0 || !sgx_12_1) { + kvm_inject_gp(vcpu, 0); + return 1; + } + + if (sgx_get_encls_gva(vcpu, kvm_rbx_read(vcpu), 32, 32, &pageinfo_gva) || + sgx_get_encls_gva(vcpu, kvm_rcx_read(vcpu), 4096, 4096, &secs_gva)) + return 1; + + /* + * Copy the PAGEINFO to local memory, its pointers need to be + * translated, i.e. we need to do a deep copy/translate. + */ + r = kvm_read_guest_virt(vcpu, pageinfo_gva, &pageinfo, + sizeof(pageinfo), &ex); + if (r == X86EMUL_PROPAGATE_FAULT) { + kvm_inject_emulated_page_fault(vcpu, &ex); + return 1; + } else if (r != X86EMUL_CONTINUE) { + sgx_handle_emulation_failure(vcpu, pageinfo_gva, size); + return 0; + } + + /* + * Verify alignment early. This conveniently avoids having to worry + * about page splits on userspace addresses. + */ + if (!IS_ALIGNED(pageinfo.metadata, 64) || + !IS_ALIGNED(pageinfo.contents, 4096)) { + kvm_inject_gp(vcpu, 0); + return 1; + } + + /* + * Translate the SECINFO, SOURCE and SECS pointers from GVA to GPA. + * Resume the guest on failure to inject a #PF. + */ + if (sgx_gva_to_gpa(vcpu, pageinfo.metadata, false, &metadata_gpa) || + sgx_gva_to_gpa(vcpu, pageinfo.contents, false, &contents_gpa) || + sgx_gva_to_gpa(vcpu, secs_gva, true, &secs_gpa)) + return 1; + + /* + * ...and then to HVA. The order of accesses isn't architectural, i.e. + * KVM doesn't have to fully process one address at a time. Exit to + * userspace if a GPA is invalid. + */ + if (sgx_gpa_to_hva(vcpu, metadata_gpa, + (unsigned long *)&pageinfo.metadata) || + sgx_gpa_to_hva(vcpu, contents_gpa, + (unsigned long *)&pageinfo.contents) || + sgx_gpa_to_hva(vcpu, secs_gpa, &secs_hva)) + return 0; + + /* + * Read out select portions of the input SECS to enforce userspace + * restrictions on MISCSELECT, ATTRIBUTES, etc... Note, 'contents' is + * page aligned, i.e. no need to worry about page splits. + */ + m_hva = pageinfo.contents + offsetof(struct sgx_secs, miscselect); + a_hva = pageinfo.contents + offsetof(struct sgx_secs, attributes); + x_hva = pageinfo.contents + offsetof(struct sgx_secs, xfrm); + s_hva = pageinfo.contents + offsetof(struct sgx_secs, size); + + /* Exit to userspace if copying from a host userspace address fails. */ + if (sgx_read_hva(vcpu, m_hva, &miscselect, sizeof(miscselect)) || + sgx_read_hva(vcpu, a_hva, &attributes, sizeof(attributes)) || + sgx_read_hva(vcpu, x_hva, &xfrm, sizeof(xfrm)) || + sgx_read_hva(vcpu, s_hva, &size, sizeof(size))) + return 0; + + /* Enforce restriction of access to the PROVISIONKEY. */ + if (!vcpu->kvm->arch.sgx_provisioning_allowed && + (attributes & SGX_ATTR_PROVISIONKEY)) { + if (sgx_12_1->eax & SGX_ATTR_PROVISIONKEY) + pr_warn_once("KVM: SGX PROVISIONKEY advertised but not allowed\n"); + kvm_inject_gp(vcpu, 0); + return 1; + } + + /* Enforce CPUID restrictions on MISCSELECT, ATTRIBUTES and XFRM. */ + if ((u32)miscselect & ~sgx_12_0->ebx || + (u32)attributes & ~sgx_12_1->eax || + (u32)(attributes >> 32) & ~sgx_12_1->ebx || + (u32)xfrm & ~sgx_12_1->ecx || + (u32)(xfrm >> 32) & ~sgx_12_1->edx) { + kvm_inject_gp(vcpu, 0); + return 1; + } + + /* Enforce CPUID restriction on max enclave size. */ + max_size_log2 = (attributes & SGX_ATTR_MODE64BIT) ? sgx_12_0->edx >> 8 : + sgx_12_0->edx; + if (size >= BIT_ULL(max_size_log2)) + kvm_inject_gp(vcpu, 0); + + if (sgx_virt_ecreate(&pageinfo, (void __user *)secs_hva, &trapnr)) + return sgx_inject_fault(vcpu, secs_gva, trapnr); + + return kvm_skip_emulated_instruction(vcpu); +} + static inline bool encls_leaf_enabled_in_guest(struct kvm_vcpu *vcpu, u32 leaf) { if (!enable_sgx || !guest_cpuid_has(vcpu, X86_FEATURE_SGX)) @@ -42,6 +283,8 @@ int handle_encls(struct kvm_vcpu *vcpu) } else if (!sgx_enabled_in_guest_bios(vcpu)) { kvm_inject_gp(vcpu, 0); } else { + if (leaf == ECREATE) + return handle_encls_ecreate(vcpu); WARN(1, "KVM: unexpected exit on ENCLS[%u]", leaf); vcpu->run->exit_reason = KVM_EXIT_UNKNOWN; vcpu->run->hw.hardware_exit_reason = EXIT_REASON_ENCLS;