From patchwork Wed Mar 8 18:03:54 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 9611823 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 3D6D8602B4 for ; Wed, 8 Mar 2017 18:36:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 39979285B1 for ; Wed, 8 Mar 2017 18:36:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2C59C285E3; Wed, 8 Mar 2017 18:36:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM, T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6040B285B1 for ; Wed, 8 Mar 2017 18:36:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753766AbdCHSf5 (ORCPT ); Wed, 8 Mar 2017 13:35:57 -0500 Received: from mail-wm0-f68.google.com ([74.125.82.68]:36748 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753254AbdCHSfz (ORCPT ); Wed, 8 Mar 2017 13:35:55 -0500 Received: by mail-wm0-f68.google.com with SMTP id v190so7463800wme.3; Wed, 08 Mar 2017 10:35:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=YnvFT50LW9foQYvJmEO7x2G+hHSFNcUr6HLBkLRRniQ=; b=WagvKakF+UHgBddZF9pzdITPzTrCRvhYDyC6z3XZMntEPojqNlB5kVbHQn/dwhxvT0 uNv8alxrNdOwVdKMS5sEIfBPexFTfTRfE9a8JDwd6wVMhwVHvkhSMLhS6hmV4vxY0hQh vCV2qQOXbB/OL+ZyXwQuU3TqQ7PfE8hMEJzNRj+I3lLlqxREdxZ4KvmoxNyWIzMLygpM JESS6ibKUC3D+92ZP7IHF/UOSXT73zRPsgwi59+Xfk+Jhnvej3kYIvocBCBMWDQcdB8+ sc4ImwIF/trkoFQ5fmvTV3EePiphq6MFy3BODzu2QGMXf5kM9SlM+qLxQixizBjbLoyw FYBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=YnvFT50LW9foQYvJmEO7x2G+hHSFNcUr6HLBkLRRniQ=; b=RhRgCeUQha5oTMGpMltXQnNqOBE44CpMYdyt55AMN7NuftswH9Ie4yLtTkVGBK3DSZ URfkP6Fl2RLUbtQpvRQNeiWGBb365W8r9fjGLoI2t7KUM1qzDwbKIQY2G/cSljrfZ7zU fE+iTpQemmFGrvrOCFrKuuKCtA8oCP4Fk2Rto2J+pOOiAA58A2MSzosjwMeukFpo1mpm D0V3C+cuUOS5V7b3McRzALwPMrb/SQDSEMyfLjQryO40dnZjYkrOD11R6pXl096K2mJh Uv7WVZ2HyIad8mcpwAOr69y0/1eglII2c+Fs2Fs3IwMEsv1OWC1XTfXEXX+S4rVqNwJ3 VkIw== X-Gm-Message-State: AMke39kkfk8qlUN5tRGKNnpPZWkwNj1m3V+TpPHiLgql/XhGegst2U8Yo/61UIJJbSEqNA== X-Received: by 10.28.170.67 with SMTP id t64mr23955452wme.18.1488996243300; Wed, 08 Mar 2017 10:04:03 -0800 (PST) Received: from 640k.lan (94-36-220-234.adsl-ull.clienti.tiscali.it. [94.36.220.234]) by smtp.gmail.com with ESMTPSA id z70sm5125458wrc.2.2017.03.08.10.04.02 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 08 Mar 2017 10:04:02 -0800 (PST) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: bdas@redhat.com, dmatlack@google.com Subject: [PATCH 4/6] kvm: nVMX: support EPT accessed/dirty bits Date: Wed, 8 Mar 2017 19:03:54 +0100 Message-Id: <1488996236-5676-5-git-send-email-pbonzini@redhat.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1488996236-5676-1-git-send-email-pbonzini@redhat.com> References: <1488996236-5676-1-git-send-email-pbonzini@redhat.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Now use bit 6 of EPTP to optionally enable A/D bits for EPTP. Another thing to change is that, when EPT accessed and dirty bits are not in use, VMX treats accesses to guest paging structures as data reads. When they are in use (bit 6 of EPTP is set), they are treated as writes and the corresponding EPT dirty bit is set. The MMU didn't know this detail, so this patch adds it. We also have to fix up the exit qualification. It may be wrong because KVM sets bit 6 but the guest might not. L1 emulates EPT A/D bits using write permissions, so in principle it may be possible for EPT A/D bits to be used by L1 even though not available in hardware. The problem is that guest page-table walks will be treated as reads rather than writes, so they would not cause an EPT violation. Therefore, this patch requires host support for the feature. Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 5 +++-- arch/x86/kvm/mmu.c | 4 +++- arch/x86/kvm/mmu.h | 3 ++- arch/x86/kvm/paging_tmpl.h | 33 ++++++++++++++++----------------- arch/x86/kvm/vmx.c | 32 ++++++++++++++++++++++++++++---- 5 files changed, 52 insertions(+), 25 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 74ef58c8ff53..7dbb8d622683 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -343,9 +343,10 @@ struct kvm_mmu { void (*update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, u64 *spte, const void *pte); hpa_t root_hpa; - int root_level; - int shadow_root_level; union kvm_mmu_page_role base_role; + u8 root_level; + u8 shadow_root_level; + u8 ept_ad; bool direct_map; /* diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index ac7810513d0e..558676538fca 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -4340,7 +4340,8 @@ void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_init_shadow_mmu); -void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly) +void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly, + bool accessed_dirty) { struct kvm_mmu *context = &vcpu->arch.mmu; @@ -4349,6 +4350,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly) context->shadow_root_level = kvm_x86_ops->get_tdp_level(); context->nx = true; + context->ept_ad = accessed_dirty; context->page_fault = ept_page_fault; context->gva_to_gpa = ept_gva_to_gpa; context->sync_page = ept_sync_page; diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index ddc56e91f2e4..d8ccb32f7308 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -74,7 +74,8 @@ enum { int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct); void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu); -void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly); +void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly, + bool accessed_dirty); static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm) { diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 3e20f7b33892..8bf829703a00 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -23,13 +23,6 @@ * so the code in this file is compiled twice, once per pte size. */ -/* - * This is used to catch non optimized PT_GUEST_(DIRTY|ACCESS)_SHIFT macro - * uses for EPT without A/D paging type. - */ -extern u64 __pure __using_nonexistent_pte_bit(void) - __compiletime_error("wrong use of PT_GUEST_(DIRTY|ACCESS)_SHIFT"); - #if PTTYPE == 64 #define pt_element_t u64 #define guest_walker guest_walker64 @@ -39,8 +32,6 @@ extern u64 __pure __using_nonexistent_pte_bit(void) #define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl) #define PT_INDEX(addr, level) PT64_INDEX(addr, level) #define PT_LEVEL_BITS PT64_LEVEL_BITS - #define PT_GUEST_ACCESSED_MASK PT_ACCESSED_MASK - #define PT_GUEST_DIRTY_MASK PT_DIRTY_MASK #define PT_GUEST_DIRTY_SHIFT PT_DIRTY_SHIFT #define PT_GUEST_ACCESSED_SHIFT PT_ACCESSED_SHIFT #define PT_HAVE_ACCESSED_DIRTY(mmu) true @@ -61,8 +52,6 @@ extern u64 __pure __using_nonexistent_pte_bit(void) #define PT_INDEX(addr, level) PT32_INDEX(addr, level) #define PT_LEVEL_BITS PT32_LEVEL_BITS #define PT_MAX_FULL_LEVELS 2 - #define PT_GUEST_ACCESSED_MASK PT_ACCESSED_MASK - #define PT_GUEST_DIRTY_MASK PT_DIRTY_MASK #define PT_GUEST_DIRTY_SHIFT PT_DIRTY_SHIFT #define PT_GUEST_ACCESSED_SHIFT PT_ACCESSED_SHIFT #define PT_HAVE_ACCESSED_DIRTY(mmu) true @@ -76,17 +65,18 @@ extern u64 __pure __using_nonexistent_pte_bit(void) #define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl) #define PT_INDEX(addr, level) PT64_INDEX(addr, level) #define PT_LEVEL_BITS PT64_LEVEL_BITS - #define PT_GUEST_ACCESSED_MASK 0 - #define PT_GUEST_DIRTY_MASK 0 - #define PT_GUEST_DIRTY_SHIFT __using_nonexistent_pte_bit() - #define PT_GUEST_ACCESSED_SHIFT __using_nonexistent_pte_bit() - #define PT_HAVE_ACCESSED_DIRTY(mmu) false + #define PT_GUEST_DIRTY_SHIFT 9 + #define PT_GUEST_ACCESSED_SHIFT 8 + #define PT_HAVE_ACCESSED_DIRTY(mmu) ((mmu)->ept_ad) #define CMPXCHG cmpxchg64 #define PT_MAX_FULL_LEVELS 4 #else #error Invalid PTTYPE value #endif +#define PT_GUEST_DIRTY_MASK (1 << PT_GUEST_DIRTY_SHIFT) +#define PT_GUEST_ACCESSED_MASK (1 << PT_GUEST_ACCESSED_SHIFT) + #define gpte_to_gfn_lvl FNAME(gpte_to_gfn_lvl) #define gpte_to_gfn(pte) gpte_to_gfn_lvl((pte), PT_PAGE_TABLE_LEVEL) @@ -290,6 +280,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, pt_element_t __user *uninitialized_var(ptep_user); gfn_t table_gfn; unsigned index, pt_access, pte_access, accessed_dirty, pte_pkey; + unsigned nested_access; gpa_t pte_gpa; bool have_ad; int offset; @@ -319,6 +310,14 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, ASSERT(!(is_long_mode(vcpu) && !is_pae(vcpu))); accessed_dirty = have_ad ? PT_GUEST_ACCESSED_MASK : 0; + + /* + * FIXME: on Intel processors, loads of the PDPTE registers for PAE paging + * by the MOV to CR instruction are treated as reads and do not cause the + * processor to set the dirty flag in tany EPT paging-structure entry. + */ + nested_access = (have_ad ? PFERR_WRITE_MASK : 0) | PFERR_USER_MASK; + pt_access = pte_access = ACC_ALL; ++walker->level; @@ -338,7 +337,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, walker->pte_gpa[walker->level - 1] = pte_gpa; real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn), - PFERR_USER_MASK|PFERR_WRITE_MASK, + nested_access, &walker->fault); /* diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index cb9034601c7b..a3395e23cf5a 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2772,6 +2772,8 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx) vmx->nested.nested_vmx_ept_caps |= VMX_EPT_EXTENT_GLOBAL_BIT | VMX_EPT_EXTENT_CONTEXT_BIT | VMX_EPT_2MB_PAGE_BIT | VMX_EPT_1GB_PAGE_BIT; + if (enable_ept_ad_bits) + vmx->nested.nested_vmx_ept_caps |= VMX_EPT_AD_BIT; } else vmx->nested.nested_vmx_ept_caps = 0; @@ -6247,6 +6249,17 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) return 0; } + if (is_guest_mode(vcpu) && !(exit_qualification & 0x100)) { + /* + * Fix up exit_qualification according to whether guest + * page table accesses are reads or writes. + */ + u64 eptp = nested_ept_get_cr3(vcpu); + exit_qualification &= ~EPT_VIOLATION_ACC_WRITE; + if (eptp & VMX_EPT_AD_ENABLE_BIT) + exit_qualification |= EPT_VIOLATION_ACC_WRITE; + } + /* * EPT violation happened while executing iret from NMI, * "blocked by NMI" bit has to be set before next VM entry. @@ -9490,17 +9503,25 @@ static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu) return get_vmcs12(vcpu)->ept_pointer; } -static void nested_ept_init_mmu_context(struct kvm_vcpu *vcpu) +static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu) { + u64 eptp; + WARN_ON(mmu_is_nested(vcpu)); + eptp = nested_ept_get_cr3(vcpu); + if ((eptp & VMX_EPT_AD_ENABLE_BIT) && !enable_ept_ad_bits) + return 1; + kvm_init_shadow_ept_mmu(vcpu, to_vmx(vcpu)->nested.nested_vmx_ept_caps & - VMX_EPT_EXECUTE_ONLY_BIT); + VMX_EPT_EXECUTE_ONLY_BIT, + eptp & VMX_EPT_AD_ENABLE_BIT); vcpu->arch.mmu.set_cr3 = vmx_set_cr3; vcpu->arch.mmu.get_cr3 = nested_ept_get_cr3; vcpu->arch.mmu.inject_page_fault = nested_ept_inject_page_fault; vcpu->arch.walk_mmu = &vcpu->arch.nested_mmu; + return 0; } static void nested_ept_uninit_mmu_context(struct kvm_vcpu *vcpu) @@ -10267,7 +10289,9 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, if (nested_cpu_has_ept(vmcs12)) { kvm_mmu_unload(vcpu); - nested_ept_init_mmu_context(vcpu); + if (nested_ept_init_mmu_context(vcpu)) { + *entry_failure_code = ENTRY_FAIL_DEFAULT; + return 1; + } } /*