From patchwork Fri Jul 31 21:23:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 11695501 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0FD6413B1 for ; Fri, 31 Jul 2020 21:24:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EE35121744 for ; Fri, 31 Jul 2020 21:24:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728642AbgGaVX1 (ORCPT ); Fri, 31 Jul 2020 17:23:27 -0400 Received: from mga14.intel.com ([192.55.52.115]:50224 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728163AbgGaVX1 (ORCPT ); Fri, 31 Jul 2020 17:23:27 -0400 IronPort-SDR: HijNdAYRtKmE9g3tcHzpEd+Rjq9M0BkBonaN4mVbjeCGUmjmX8APcbxKDtoxXg9nAiOI4uwndI s/umPP1CZZug== X-IronPort-AV: E=McAfee;i="6000,8403,9699"; a="151075127" X-IronPort-AV: E=Sophos;i="5.75,419,1589266800"; d="scan'208";a="151075127" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2020 14:23:26 -0700 IronPort-SDR: oTBERTbqHkpkNNTA9L4r9QbNT7cLdo4vviEhrdixBuUuB1suq+jhUh48eJQdDkxK/zcCQOuEC1 JgHtVsq6+8cQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,419,1589266800"; d="scan'208";a="331191296" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.160]) by orsmga007.jf.intel.com with ESMTP; 31 Jul 2020 14:23:26 -0700 From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, eric van tassell , Tom Lendacky Subject: [RFC PATCH 1/8] KVM: x86/mmu: Return old SPTE from mmu_spte_clear_track_bits() Date: Fri, 31 Jul 2020 14:23:16 -0700 Message-Id: <20200731212323.21746-2-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200731212323.21746-1-sean.j.christopherson@intel.com> References: <20200731212323.21746-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Return the old SPTE when clearing a SPTE and push the "old SPTE present" check to the caller. Tracking pinned SPTEs will use the old SPTE in rmap_remove() to determine whether or not the SPTE is pinned. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 289dddff2615f..d737042fea55e 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -930,9 +930,9 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte) * Rules for using mmu_spte_clear_track_bits: * It sets the sptep from present to nonpresent, and track the * state bits, it is used to clear the last level sptep. - * Returns non-zero if the PTE was previously valid. + * Returns the old PTE. */ -static int mmu_spte_clear_track_bits(u64 *sptep) +static u64 mmu_spte_clear_track_bits(u64 *sptep) { kvm_pfn_t pfn; u64 old_spte = *sptep; @@ -943,7 +943,7 @@ static int mmu_spte_clear_track_bits(u64 *sptep) old_spte = __update_clear_spte_slow(sptep, 0ull); if (!is_shadow_present_pte(old_spte)) - return 0; + return old_spte; pfn = spte_to_pfn(old_spte); @@ -960,7 +960,7 @@ static int mmu_spte_clear_track_bits(u64 *sptep) if (is_dirty_spte(old_spte)) kvm_set_pfn_dirty(pfn); - return 1; + return old_spte; } /* @@ -1484,7 +1484,9 @@ static u64 *rmap_get_next(struct rmap_iterator *iter) static void drop_spte(struct kvm *kvm, u64 *sptep) { - if (mmu_spte_clear_track_bits(sptep)) + u64 old_spte = mmu_spte_clear_track_bits(sptep); + + if (is_shadow_present_pte(old_spte)) rmap_remove(kvm, sptep); } From patchwork Fri Jul 31 21:23:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 11695503 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7D6391575 for ; Fri, 31 Jul 2020 21:24:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7127221744 for ; Fri, 31 Jul 2020 21:24:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729823AbgGaVYA (ORCPT ); Fri, 31 Jul 2020 17:24:00 -0400 Received: from mga14.intel.com ([192.55.52.115]:50224 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728186AbgGaVX1 (ORCPT ); Fri, 31 Jul 2020 17:23:27 -0400 IronPort-SDR: tZgs9q8IUzK8Ad1NG6weuFbnExjtjlVbyBHxdI/hpbNV2aYGGrJcBe5BlCHAzmC1RWuvNp7iqO qKDqrNw2SPqw== X-IronPort-AV: E=McAfee;i="6000,8403,9699"; a="151075128" X-IronPort-AV: E=Sophos;i="5.75,419,1589266800"; d="scan'208";a="151075128" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2020 14:23:27 -0700 IronPort-SDR: XgiYUcAvmq0d0KmB7mBPZFVTJZBsdfcE9u47qGuK/O4qUyua/RAdqnnhdnPfjJbgAJA4QwNSaH nPhpyV9KSfUw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,419,1589266800"; d="scan'208";a="331191298" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.160]) by orsmga007.jf.intel.com with ESMTP; 31 Jul 2020 14:23:26 -0700 From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, eric van tassell , Tom Lendacky Subject: [RFC PATCH 2/8] KVM: x86/mmu: Use bits 2:0 to check for present SPTEs Date: Fri, 31 Jul 2020 14:23:17 -0700 Message-Id: <20200731212323.21746-3-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200731212323.21746-1-sean.j.christopherson@intel.com> References: <20200731212323.21746-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Use what are effectively EPT's RWX bits to detect present SPTEs instead of simply looking for a non-zero value. This will allow using a non-zero initial value for SPTEs as well as using not-present SPTEs to track metadata for zapped private SPTEs. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 9 +++++++-- arch/x86/kvm/mmu/paging_tmpl.h | 3 ++- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index d737042fea55e..82f69a7456004 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -625,9 +625,14 @@ static int is_nx(struct kvm_vcpu *vcpu) return vcpu->arch.efer & EFER_NX; } -static int is_shadow_present_pte(u64 pte) +static inline bool __is_shadow_present_pte(u64 pte) { - return (pte != 0) && !is_mmio_spte(pte); + return !!(pte & 0x7); +} + +static bool is_shadow_present_pte(u64 pte) +{ + return __is_shadow_present_pte(pte) && !is_mmio_spte(pte); } static int is_large_pte(u64 pte) diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 0172a949f6a75..57813e92ea8e0 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -1024,7 +1024,8 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) gpa_t pte_gpa; gfn_t gfn; - if (!sp->spt[i]) + if (!__is_shadow_present_pte(sp->spt[i]) && + !is_mmio_spte(sp->spt[i])) continue; pte_gpa = first_pte_gpa + i * sizeof(pt_element_t); From patchwork Fri Jul 31 21:23:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 11695499 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 69D7113B1 for ; Fri, 31 Jul 2020 21:23:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 519C02087C for ; Fri, 31 Jul 2020 21:23:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729079AbgGaVX2 (ORCPT ); Fri, 31 Jul 2020 17:23:28 -0400 Received: from mga14.intel.com ([192.55.52.115]:50224 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728655AbgGaVX2 (ORCPT ); Fri, 31 Jul 2020 17:23:28 -0400 IronPort-SDR: RcyqnpSecoO/Ccf9TaeG6VdztrKD+CF578Zgm+BOiHrpDQZgyFOTz6QrkEtf1hLJvP6+Ia8/On ZIK3n9Ctg60Q== X-IronPort-AV: E=McAfee;i="6000,8403,9699"; a="151075129" X-IronPort-AV: E=Sophos;i="5.75,419,1589266800"; d="scan'208";a="151075129" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2020 14:23:27 -0700 IronPort-SDR: RSAMy2OpJI1Dt0mMuQewKa8YeQBi06oPFkDh8y7woTwL97jtt2H4A+yYsdoNwAFpV7Wy3ntjF+ wlx3vGuGYHyA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,419,1589266800"; d="scan'208";a="331191301" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.160]) by orsmga007.jf.intel.com with ESMTP; 31 Jul 2020 14:23:26 -0700 From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, eric van tassell , Tom Lendacky Subject: [RFC PATCH 3/8] KVM: x86/mmu: Refactor handling of not-present SPTEs in mmu_set_spte() Date: Fri, 31 Jul 2020 14:23:18 -0700 Message-Id: <20200731212323.21746-4-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200731212323.21746-1-sean.j.christopherson@intel.com> References: <20200731212323.21746-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Return early from mmu_set_spte() if the new SPTE is not-present so as to reduce the indentation of the code that performs metadata updates, e.g. rmap manipulation. Additional metadata updates will soon follow... Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 82f69a7456004..182f398036248 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3126,12 +3126,14 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, if (!was_rmapped && is_large_pte(*sptep)) ++vcpu->kvm->stat.lpages; - if (is_shadow_present_pte(*sptep)) { - if (!was_rmapped) { - rmap_count = rmap_add(vcpu, sptep, gfn); - if (rmap_count > RMAP_RECYCLE_THRESHOLD) - rmap_recycle(vcpu, sptep, gfn); - } + /* No additional tracking necessary for not-present SPTEs. */ + if (!is_shadow_present_pte(*sptep)) + return ret; + + if (!was_rmapped) { + rmap_count = rmap_add(vcpu, sptep, gfn); + if (rmap_count > RMAP_RECYCLE_THRESHOLD) + rmap_recycle(vcpu, sptep, gfn); } return ret; From patchwork Fri Jul 31 21:23:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 11695493 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6227314B7 for ; Fri, 31 Jul 2020 21:23:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4D63B22B3F for ; Fri, 31 Jul 2020 21:23:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729635AbgGaVXh (ORCPT ); Fri, 31 Jul 2020 17:23:37 -0400 Received: from mga14.intel.com ([192.55.52.115]:50224 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728163AbgGaVX3 (ORCPT ); Fri, 31 Jul 2020 17:23:29 -0400 IronPort-SDR: t9we91MTc4fYZIVkp+B0V0VSVMJQxIvjXH1WOizERGiEnKnoOOue9IPQBe00JTmm4ENEXkFJ3q CI0TMCZU1lig== X-IronPort-AV: E=McAfee;i="6000,8403,9699"; a="151075130" X-IronPort-AV: E=Sophos;i="5.75,419,1589266800"; d="scan'208";a="151075130" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2020 14:23:27 -0700 IronPort-SDR: 2YkxLunpQZauh/yQCbzPCFkS86bbTRCTTyschYDC27EpCAGA9l89xiiKJtIXrTZP5KiiRqwwqk ET8nkNbeMUdQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,419,1589266800"; d="scan'208";a="331191305" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.160]) by orsmga007.jf.intel.com with ESMTP; 31 Jul 2020 14:23:26 -0700 From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, eric van tassell , Tom Lendacky Subject: [RFC PATCH 4/8] KVM: x86/mmu: Add infrastructure for pinning PFNs on demand Date: Fri, 31 Jul 2020 14:23:19 -0700 Message-Id: <20200731212323.21746-5-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200731212323.21746-1-sean.j.christopherson@intel.com> References: <20200731212323.21746-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm_host.h | 7 ++ arch/x86/kvm/mmu/mmu.c | 111 ++++++++++++++++++++++++++------ 2 files changed, 99 insertions(+), 19 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 1bab87a444d78..b14864f3e8e74 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1138,6 +1138,13 @@ struct kvm_x86_ops { void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, unsigned long cr3); + bool (*pin_spte)(struct kvm_vcpu *vcpu, gfn_t gfn, int level, + kvm_pfn_t pfn); + void (*drop_pinned_spte)(struct kvm *kvm, gfn_t gfn, int level, + kvm_pfn_t pfn); + void (*zap_pinned_spte)(struct kvm *kvm, gfn_t gfn, int level); + void (*unzap_pinned_spte)(struct kvm *kvm, gfn_t gfn, int level); + bool (*has_wbinvd_exit)(void); /* Returns actual tsc_offset set in active VMCS */ diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 182f398036248..cab3b2f2f49c3 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -133,6 +133,9 @@ module_param(dbg, bool, 0644); #define SPTE_AD_WRPROT_ONLY_MASK (2ULL << 52) #define SPTE_MMIO_MASK (3ULL << 52) +/* Special SPTEs flags that can only be used for non-MMIO SPTEs. */ +#define SPTE_PINNED_MASK BIT_ULL(62) + #define PT64_LEVEL_BITS 9 #define PT64_LEVEL_SHIFT(level) \ @@ -211,6 +214,7 @@ enum { RET_PF_EMULATE = 1, RET_PF_INVALID = 2, RET_PF_FIXED = 3, + RET_PF_UNZAPPED = 4, }; struct pte_list_desc { @@ -635,6 +639,11 @@ static bool is_shadow_present_pte(u64 pte) return __is_shadow_present_pte(pte) && !is_mmio_spte(pte); } +static bool is_pinned_pte(u64 pte) +{ + return !!(pte & SPTE_PINNED_MASK); +} + static int is_large_pte(u64 pte) { return pte & PT_PAGE_SIZE_MASK; @@ -937,15 +946,15 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte) * state bits, it is used to clear the last level sptep. * Returns the old PTE. */ -static u64 mmu_spte_clear_track_bits(u64 *sptep) +static u64 __mmu_spte_clear_track_bits(u64 *sptep, u64 clear_value) { kvm_pfn_t pfn; u64 old_spte = *sptep; if (!spte_has_volatile_bits(old_spte)) - __update_clear_spte_fast(sptep, 0ull); + __update_clear_spte_fast(sptep, clear_value); else - old_spte = __update_clear_spte_slow(sptep, 0ull); + old_spte = __update_clear_spte_slow(sptep, clear_value); if (!is_shadow_present_pte(old_spte)) return old_spte; @@ -968,6 +977,11 @@ static u64 mmu_spte_clear_track_bits(u64 *sptep) return old_spte; } +static inline u64 mmu_spte_clear_track_bits(u64 *sptep) +{ + return __mmu_spte_clear_track_bits(sptep, 0ull); +} + /* * Rules for using mmu_spte_clear_no_track: * Directly clear spte without caring the state bits of sptep, @@ -1399,7 +1413,7 @@ static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn) return pte_list_add(vcpu, spte, rmap_head); } -static void rmap_remove(struct kvm *kvm, u64 *spte) +static void rmap_remove(struct kvm *kvm, u64 *spte, u64 old_spte) { struct kvm_mmu_page *sp; gfn_t gfn; @@ -1409,6 +1423,10 @@ static void rmap_remove(struct kvm *kvm, u64 *spte) gfn = kvm_mmu_page_get_gfn(sp, spte - sp->spt); rmap_head = gfn_to_rmap(kvm, gfn, sp); __pte_list_remove(spte, rmap_head); + + if (is_pinned_pte(old_spte)) + kvm_x86_ops.drop_pinned_spte(kvm, gfn, sp->role.level - 1, + spte_to_pfn(old_spte)); } /* @@ -1446,7 +1464,7 @@ static u64 *rmap_get_first(struct kvm_rmap_head *rmap_head, iter->pos = 0; sptep = iter->desc->sptes[iter->pos]; out: - BUG_ON(!is_shadow_present_pte(*sptep)); + BUG_ON(!is_shadow_present_pte(*sptep) && !is_pinned_pte(*sptep)); return sptep; } @@ -1491,8 +1509,8 @@ static void drop_spte(struct kvm *kvm, u64 *sptep) { u64 old_spte = mmu_spte_clear_track_bits(sptep); - if (is_shadow_present_pte(old_spte)) - rmap_remove(kvm, sptep); + if (is_shadow_present_pte(old_spte) || is_pinned_pte(old_spte)) + rmap_remove(kvm, sptep, old_spte); } @@ -1730,17 +1748,49 @@ static bool rmap_write_protect(struct kvm_vcpu *vcpu, u64 gfn) return kvm_mmu_slot_gfn_write_protect(vcpu->kvm, slot, gfn); } +static bool kvm_mmu_zap_pinned_spte(struct kvm *kvm, u64 *sptep) +{ + struct kvm_mmu_page *sp; + kvm_pfn_t pfn; + gfn_t gfn; + + if (!(*sptep & SPTE_PINNED_MASK)) + return false; + + sp = sptep_to_sp(sptep); + gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt); + pfn = spte_to_pfn(*sptep); + + if (kvm_x86_ops.zap_pinned_spte) + kvm_x86_ops.zap_pinned_spte(kvm, gfn, sp->role.level - 1); + + __mmu_spte_clear_track_bits(sptep, SPTE_PINNED_MASK | pfn << PAGE_SHIFT); + return true; +} + static bool kvm_zap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head) { u64 *sptep; struct rmap_iterator iter; bool flush = false; - while ((sptep = rmap_get_first(rmap_head, &iter))) { +restart: + for_each_rmap_spte(rmap_head, &iter, sptep) { rmap_printk("%s: spte %p %llx.\n", __func__, sptep, *sptep); + if (!is_shadow_present_pte(*sptep)) { + WARN_ON_ONCE(!is_pinned_pte(*sptep)); + continue; + } + + flush = true; + + /* Keep the rmap if the SPTE is pinned. */ + if (kvm_mmu_zap_pinned_spte(kvm, sptep)) + continue; + pte_list_remove(rmap_head, sptep); - flush = true; + goto restart; } return flush; @@ -1774,6 +1824,10 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head, need_flush = 1; + /* Pinned pages should not be relocated (obviously). */ + if (WARN_ON_ONCE(is_pinned_pte(*sptep))) + continue; + if (pte_write(*ptep)) { pte_list_remove(rmap_head, sptep); goto restart; @@ -2630,7 +2684,7 @@ static bool mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp, struct kvm_mmu_page *child; pte = *spte; - if (is_shadow_present_pte(pte)) { + if (is_shadow_present_pte(pte) || is_pinned_pte(pte)) { if (is_last_spte(pte, sp->role.level)) { drop_spte(kvm, spte); if (is_large_pte(pte)) @@ -2639,7 +2693,7 @@ static bool mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp, child = to_shadow_page(pte & PT64_BASE_ADDR_MASK); drop_parent_pte(child, spte); } - return true; + return is_shadow_present_pte(pte); } if (is_mmio_spte(pte)) @@ -2987,10 +3041,13 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 spte = 0; int ret = 0; struct kvm_mmu_page *sp; + bool is_mmio_pfn; if (set_mmio_spte(vcpu, sptep, gfn, pfn, pte_access)) return 0; + is_mmio_pfn = kvm_is_mmio_pfn(pfn); + sp = sptep_to_sp(sptep); if (sp_ad_disabled(sp)) spte |= SPTE_AD_DISABLED_MASK; @@ -3023,15 +3080,14 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, if (level > PG_LEVEL_4K) spte |= PT_PAGE_SIZE_MASK; if (tdp_enabled) - spte |= kvm_x86_ops.get_mt_mask(vcpu, gfn, - kvm_is_mmio_pfn(pfn)); + spte |= kvm_x86_ops.get_mt_mask(vcpu, gfn, is_mmio_pfn); if (host_writable) spte |= SPTE_HOST_WRITEABLE; else pte_access &= ~ACC_WRITE_MASK; - if (!kvm_is_mmio_pfn(pfn)) + if (!is_mmio_pfn) spte |= shadow_me_mask; spte |= (u64)pfn << PAGE_SHIFT; @@ -3065,6 +3121,12 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, if (speculative) spte = mark_spte_for_access_track(spte); + if (is_pinned_pte(*sptep) || + (vcpu->arch.mmu->direct_map && !is_mmio_pfn && + kvm_x86_ops.pin_spte && + kvm_x86_ops.pin_spte(vcpu, gfn, level, pfn))) + spte |= SPTE_PINNED_MASK; + set_pte: if (mmu_spte_update(sptep, spte)) ret |= SET_SPTE_NEED_REMOTE_TLB_FLUSH; @@ -3081,29 +3143,33 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, int set_spte_ret; int ret = RET_PF_FIXED; bool flush = false; + u64 pte = *sptep; pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__, *sptep, write_fault, gfn); - if (is_shadow_present_pte(*sptep)) { + if (is_shadow_present_pte(pte)) { /* * If we overwrite a PTE page pointer with a 2MB PMD, unlink * the parent of the now unreachable PTE. */ - if (level > PG_LEVEL_4K && !is_large_pte(*sptep)) { + if (level > PG_LEVEL_4K && !is_large_pte(pte)) { struct kvm_mmu_page *child; - u64 pte = *sptep; child = to_shadow_page(pte & PT64_BASE_ADDR_MASK); drop_parent_pte(child, sptep); flush = true; - } else if (pfn != spte_to_pfn(*sptep)) { + } else if (pfn != spte_to_pfn(pte)) { pgprintk("hfn old %llx new %llx\n", - spte_to_pfn(*sptep), pfn); + spte_to_pfn(pte), pfn); drop_spte(vcpu->kvm, sptep); flush = true; } else was_rmapped = 1; + } else if (is_pinned_pte(pte)) { + WARN_ON_ONCE(pfn != spte_to_pfn(pte)); + ret = RET_PF_UNZAPPED; + was_rmapped = 1; } set_spte_ret = set_spte(vcpu, sptep, pte_access, level, gfn, pfn, @@ -3136,6 +3202,9 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, rmap_recycle(vcpu, sptep, gfn); } + if (ret == RET_PF_UNZAPPED && kvm_x86_ops.unzap_pinned_spte) + kvm_x86_ops.unzap_pinned_spte(vcpu->kvm, gfn, level - 1); + return ret; } @@ -5921,6 +5990,10 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm, sp = sptep_to_sp(sptep); pfn = spte_to_pfn(*sptep); + /* Pinned page dirty logging is not supported. */ + if (WARN_ON_ONCE(is_pinned_pte(*sptep))) + continue; + /* * We cannot do huge page mapping for indirect shadow pages, * which are found on the last rmap (level = 1) when not using From patchwork Fri Jul 31 21:23:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 11695489 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3015F13B1 for ; Fri, 31 Jul 2020 21:23:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 228B22087C for ; Fri, 31 Jul 2020 21:23:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729146AbgGaVX3 (ORCPT ); Fri, 31 Jul 2020 17:23:29 -0400 Received: from mga14.intel.com ([192.55.52.115]:50227 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728950AbgGaVX3 (ORCPT ); Fri, 31 Jul 2020 17:23:29 -0400 IronPort-SDR: 0hVkYU3aPNT2LRzsPdvexGqZ08NSnATDz7OEIW0jTQXlCdoQE7brbjJH9j5y6RpmW6Gt2gzq3B 4L27yS7wOWlQ== X-IronPort-AV: E=McAfee;i="6000,8403,9699"; a="151075131" X-IronPort-AV: E=Sophos;i="5.75,419,1589266800"; d="scan'208";a="151075131" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2020 14:23:27 -0700 IronPort-SDR: FxnBvKfSaMOxsQS0SIA0SbI2MiRb3HdU0ywlypcOM9y16r0V9SdDurFsfBCMmQCZrv+sZbBylU f8JMTscKh5AA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,419,1589266800"; d="scan'208";a="331191308" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.160]) by orsmga007.jf.intel.com with ESMTP; 31 Jul 2020 14:23:26 -0700 From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, eric van tassell , Tom Lendacky Subject: [RFC PATCH 5/8] KVM: SVM: Use the KVM MMU SPTE pinning hooks to pin pages on demand Date: Fri, 31 Jul 2020 14:23:20 -0700 Message-Id: <20200731212323.21746-6-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200731212323.21746-1-sean.j.christopherson@intel.com> References: <20200731212323.21746-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Cc: eric van tassell Cc: Tom Lendacky Signed-off-by: Sean Christopherson --- arch/x86/kvm/svm/sev.c | 24 ++++++++++++++++++++++++ arch/x86/kvm/svm/svm.c | 3 +++ arch/x86/kvm/svm/svm.h | 3 +++ 3 files changed, 30 insertions(+) diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index f7f1f4ecf08e3..f640b8beb443e 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -1193,3 +1193,27 @@ void pre_sev_run(struct vcpu_svm *svm, int cpu) svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ASID; vmcb_mark_dirty(svm->vmcb, VMCB_ASID); } + +bool sev_pin_spte(struct kvm_vcpu *vcpu, gfn_t gfn, int level, kvm_pfn_t pfn) +{ + if (!sev_guest(vcpu->kvm)) + return false; + + get_page(pfn_to_page(pfn)); + + /* + * Flush any cached lines of the page being added since "ownership" of + * it will be transferred from the host to an encrypted guest. + */ + clflush_cache_range(__va(pfn << PAGE_SHIFT), page_level_size(level)); + + return true; +} + +void sev_drop_pinned_spte(struct kvm *kvm, gfn_t gfn, int level, kvm_pfn_t pfn) +{ + if (WARN_ON_ONCE(!sev_guest(kvm))) + return; + + put_page(pfn_to_page(pfn)); +} diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 0ae20af3a1677..a9f7515b4eff3 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4150,6 +4150,9 @@ static struct kvm_x86_ops svm_x86_ops __initdata = { .need_emulation_on_page_fault = svm_need_emulation_on_page_fault, .apic_init_signal_blocked = svm_apic_init_signal_blocked, + + .pin_spte = sev_pin_spte, + .drop_pinned_spte = sev_drop_pinned_spte, }; static struct kvm_x86_init_ops svm_init_ops __initdata = { diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index a798e17317094..3060e3e529cbc 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -489,4 +489,7 @@ void pre_sev_run(struct vcpu_svm *svm, int cpu); int __init sev_hardware_setup(void); void sev_hardware_teardown(void); +bool sev_pin_spte(struct kvm_vcpu *vcpu, gfn_t gfn, int level, kvm_pfn_t pfn); +void sev_drop_pinned_spte(struct kvm *kvm, gfn_t gfn, int level, kvm_pfn_t pfn); + #endif From patchwork Fri Jul 31 21:23:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 11695495 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F18C614B7 for ; Fri, 31 Jul 2020 21:23:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E413622B3F for ; Fri, 31 Jul 2020 21:23:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726571AbgGaVXg (ORCPT ); Fri, 31 Jul 2020 17:23:36 -0400 Received: from mga14.intel.com ([192.55.52.115]:50227 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728655AbgGaVX3 (ORCPT ); Fri, 31 Jul 2020 17:23:29 -0400 IronPort-SDR: 6U13IaOWIutveJZigExAcIaX+/xN9fxwDw2b9eZyQD0mwUtcTqVENFG4APuxcKr3Es883OV1zB 0C8Xzx/56SsQ== X-IronPort-AV: E=McAfee;i="6000,8403,9699"; a="151075132" X-IronPort-AV: E=Sophos;i="5.75,419,1589266800"; d="scan'208";a="151075132" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2020 14:23:27 -0700 IronPort-SDR: 0dxITT+mw/DBoc2cD0a1KPHakQR7T57+QAoK254zJ3Nu/6obst6UuBtT9jiCX5LntxjXmx4SjO 70KVutgnY2sw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,419,1589266800"; d="scan'208";a="331191311" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.160]) by orsmga007.jf.intel.com with ESMTP; 31 Jul 2020 14:23:26 -0700 From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, eric van tassell , Tom Lendacky Subject: [RFC PATCH 6/8] KVM: x86/mmu: Move 'pfn' variable to caller of direct_page_fault() Date: Fri, 31 Jul 2020 14:23:21 -0700 Message-Id: <20200731212323.21746-7-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200731212323.21746-1-sean.j.christopherson@intel.com> References: <20200731212323.21746-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org When adding pages prior to boot, SEV needs to pin the resulting host pfn so that the pages that are consumed by sev_launch_update_data() are not moved after the memory is encrypted, which would corrupt the guest data. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index cab3b2f2f49c3..92b133d7b1713 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4156,7 +4156,8 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, } static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, - bool prefault, int max_level, bool is_tdp) + bool prefault, int max_level, bool is_tdp, + kvm_pfn_t *pfn) { bool write = error_code & PFERR_WRITE_MASK; bool exec = error_code & PFERR_FETCH_MASK; @@ -4165,7 +4166,6 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, gfn_t gfn = gpa >> PAGE_SHIFT; unsigned long mmu_seq; - kvm_pfn_t pfn; int r; if (page_fault_handle_page_track(vcpu, error_code, gfn)) @@ -4184,10 +4184,10 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, mmu_seq = vcpu->kvm->mmu_notifier_seq; smp_rmb(); - if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, write, &map_writable)) + if (try_async_pf(vcpu, prefault, gfn, gpa, pfn, write, &map_writable)) return RET_PF_RETRY; - if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r)) + if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, *pfn, ACC_ALL, &r)) return r; r = RET_PF_RETRY; @@ -4197,23 +4197,25 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, r = make_mmu_pages_available(vcpu); if (r) goto out_unlock; - r = __direct_map(vcpu, gpa, write, map_writable, max_level, pfn, + r = __direct_map(vcpu, gpa, write, map_writable, max_level, *pfn, prefault, is_tdp && lpage_disallowed); out_unlock: spin_unlock(&vcpu->kvm->mmu_lock); - kvm_release_pfn_clean(pfn); + kvm_release_pfn_clean(*pfn); return r; } static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, bool prefault) { + kvm_pfn_t pfn; + pgprintk("%s: gva %lx error %x\n", __func__, gpa, error_code); /* This path builds a PAE pagetable, we can map 2mb pages at maximum. */ return direct_page_fault(vcpu, gpa & PAGE_MASK, error_code, prefault, - PG_LEVEL_2M, false); + PG_LEVEL_2M, false, &pfn); } int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code, @@ -4252,6 +4254,7 @@ EXPORT_SYMBOL_GPL(kvm_handle_page_fault); int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, bool prefault) { + kvm_pfn_t pfn; int max_level; for (max_level = KVM_MAX_HUGEPAGE_LEVEL; @@ -4265,7 +4268,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, } return direct_page_fault(vcpu, gpa, error_code, prefault, - max_level, true); + max_level, true, &pfn); } static void nonpaging_init_context(struct kvm_vcpu *vcpu, From patchwork Fri Jul 31 21:23:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 11695497 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 48A7B14B7 for ; Fri, 31 Jul 2020 21:23:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3BCCD2087C for ; Fri, 31 Jul 2020 21:23:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729491AbgGaVXe (ORCPT ); Fri, 31 Jul 2020 17:23:34 -0400 Received: from mga14.intel.com ([192.55.52.115]:50227 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729214AbgGaVXb (ORCPT ); Fri, 31 Jul 2020 17:23:31 -0400 IronPort-SDR: jSvX9znS4jgPNzwY9fCSJhN0VZliV99ABokru1ltozaS+1akuyWSu++E65Hqzp7gy/xJgNByaY mIhFa2rtwCMQ== X-IronPort-AV: E=McAfee;i="6000,8403,9699"; a="151075133" X-IronPort-AV: E=Sophos;i="5.75,419,1589266800"; d="scan'208";a="151075133" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2020 14:23:27 -0700 IronPort-SDR: n1XueUoeOVIDoIt4dEfR0g+zYAgDwHeFJJXQMA5ZiC6TDIlzXDPottjMbmZ0iNmQd0bWxq4Y0d BJKSY2GQ8QJg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,419,1589266800"; d="scan'208";a="331191315" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.160]) by orsmga007.jf.intel.com with ESMTP; 31 Jul 2020 14:23:26 -0700 From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, eric van tassell , Tom Lendacky Subject: [RFC PATCH 7/8] KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by SEV Date: Fri, 31 Jul 2020 14:23:22 -0700 Message-Id: <20200731212323.21746-8-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200731212323.21746-1-sean.j.christopherson@intel.com> References: <20200731212323.21746-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Introduce a helper to directly (pun intended) fault-in a TDP page without having to go through the full page fault path. This allows SEV to pin pages before booting the guest, provides the resulting pfn to vendor code if should be needed in the future, and allows the RET_PF_* enums to stay in mmu.c where they belong. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu.h | 3 +++ arch/x86/kvm/mmu/mmu.c | 25 +++++++++++++++++++++++++ 2 files changed, 28 insertions(+) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 9f6554613babc..06f4475b8aad8 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -108,6 +108,9 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, return vcpu->arch.mmu->page_fault(vcpu, cr2_or_gpa, err, prefault); } +kvm_pfn_t kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa, + u32 error_code, int max_level); + /* * Currently, we have two sorts of write-protection, a) the first one * write-protects guest page to sync the guest modification, b) another one is diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 92b133d7b1713..06dbc1bb79a6a 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4271,6 +4271,31 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, max_level, true, &pfn); } +kvm_pfn_t kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa, + u32 error_code, int max_level) +{ + kvm_pfn_t pfn; + int r; + + if (mmu_topup_memory_caches(vcpu, false)) + return KVM_PFN_ERR_FAULT; + + /* + * Loop on the page fault path to handle the case where an mmu_notifier + * invalidation triggers RET_PF_RETRY. In the normal page fault path, + * KVM needs to resume the guest in case the invalidation changed any + * of the page fault properties, i.e. the gpa or error code. For this + * path, the gpa and error code are fixed by the caller, and the caller + * expects failure if and only if the page fault can't be fixed. + */ + do { + r = direct_page_fault(vcpu, gpa, error_code, false, max_level, + true, &pfn); + } while (r == RET_PF_RETRY && !is_error_noslot_pfn(pfn)); + return pfn; +} +EXPORT_SYMBOL_GPL(kvm_mmu_map_tdp_page); + static void nonpaging_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *context) { From patchwork Fri Jul 31 21:23:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 11695491 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 83D3014B7 for ; Fri, 31 Jul 2020 21:23:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 74B0E21744 for ; Fri, 31 Jul 2020 21:23:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729542AbgGaVXf (ORCPT ); Fri, 31 Jul 2020 17:23:35 -0400 Received: from mga14.intel.com ([192.55.52.115]:50224 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729197AbgGaVXb (ORCPT ); Fri, 31 Jul 2020 17:23:31 -0400 IronPort-SDR: QlZ05md2KNPYIzpbAmiKQbyIGgBXkeRSKYVEeu6fHUZuIl+fUPqDNgtdYOnT6pDFQnqCWuKbDw MPqm4m31ynRg== X-IronPort-AV: E=McAfee;i="6000,8403,9699"; a="151075134" X-IronPort-AV: E=Sophos;i="5.75,419,1589266800"; d="scan'208";a="151075134" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2020 14:23:27 -0700 IronPort-SDR: cyySMoQBdLD4wZpxqY0qAajzuUxlrh3m7BtU7M5kj5Q6OmwQ6m+4iDeqyrT+RMOmRBlTHLGPB1 eQfhzUHiTzWA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,419,1589266800"; d="scan'208";a="331191319" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.160]) by orsmga007.jf.intel.com with ESMTP; 31 Jul 2020 14:23:26 -0700 From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, eric van tassell , Tom Lendacky Subject: [RFC PATCH 8/8] KVM: SVM: Pin SEV pages in MMU during sev_launch_update_data() Date: Fri, 31 Jul 2020 14:23:23 -0700 Message-Id: <20200731212323.21746-9-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200731212323.21746-1-sean.j.christopherson@intel.com> References: <20200731212323.21746-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Signed-off-by: Sean Christopherson --- arch/x86/kvm/svm/sev.c | 117 +++++++++++++++++++++++++++++++++++++++-- 1 file changed, 112 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index f640b8beb443e..eb95914578497 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -15,6 +15,7 @@ #include #include +#include "mmu.h" #include "x86.h" #include "svm.h" @@ -415,6 +416,107 @@ static unsigned long get_num_contig_pages(unsigned long idx, return pages; } +#define SEV_PFERR (PFERR_WRITE_MASK | PFERR_USER_MASK) + +static void *sev_alloc_pages(unsigned long size, unsigned long *npages) +{ + /* TODO */ + *npages = 0; + return NULL; +} + +static struct kvm_memory_slot *hva_to_memslot(struct kvm *kvm, + unsigned long hva) +{ + struct kvm_memslots *slots = kvm_memslots(kvm); + struct kvm_memory_slot *memslot; + + kvm_for_each_memslot(memslot, slots) { + if (hva >= memslot->userspace_addr && + hva < memslot->userspace_addr + + (memslot->npages << PAGE_SHIFT)) + return memslot; + } + + return NULL; +} + +static bool hva_to_gpa(struct kvm *kvm, unsigned long hva) +{ + struct kvm_memory_slot *memslot; + gpa_t gpa_offset; + + memslot = hva_to_memslot(kvm, hva); + if (!memslot) + return UNMAPPED_GVA; + + gpa_offset = hva - memslot->userspace_addr; + return ((memslot->base_gfn << PAGE_SHIFT) + gpa_offset); +} + +static struct page **sev_pin_memory_in_mmu(struct kvm *kvm, unsigned long addr, + unsigned long size, + unsigned long *npages) +{ + struct kvm_vcpu *vcpu; + struct page **pages; + unsigned long i; + kvm_pfn_t pfn; + int idx, ret; + gpa_t gpa; + + pages = sev_alloc_pages(size, npages); + if (!pages) + return ERR_PTR(-ENOMEM); + + vcpu = kvm_get_vcpu(kvm, 0); + if (mutex_lock_killable(&vcpu->mutex)) { + kvfree(pages); + return ERR_PTR(-EINTR); + } + + vcpu_load(vcpu); + idx = srcu_read_lock(&kvm->srcu); + + kvm_mmu_load(vcpu); + + for (i = 0; i < *npages; i++, addr += PAGE_SIZE) { + if (signal_pending(current)) { + ret = -ERESTARTSYS; + goto err; + } + + if (need_resched()) + cond_resched(); + + gpa = hva_to_gpa(kvm, addr); + if (gpa == UNMAPPED_GVA) { + ret = -EFAULT; + goto err; + } + pfn = kvm_mmu_map_tdp_page(vcpu, gpa, SEV_PFERR, PG_LEVEL_4K); + if (is_error_noslot_pfn(pfn)) { + ret = -EFAULT; + goto err; + } + pages[i] = pfn_to_page(pfn); + get_page(pages[i]); + } + + srcu_read_unlock(&kvm->srcu, idx); + vcpu_put(vcpu); + + mutex_unlock(&vcpu->mutex); + return pages; + +err: + for ( ; i; --i) + put_page(pages[i-1]); + + kvfree(pages); + return ERR_PTR(ret); +} + static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp) { unsigned long vaddr, vaddr_end, next_vaddr, npages, pages, size, i; @@ -439,9 +541,12 @@ static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp) vaddr_end = vaddr + size; /* Lock the user memory. */ - inpages = sev_pin_memory(kvm, vaddr, size, &npages, 1); - if (!inpages) { - ret = -ENOMEM; + if (atomic_read(&kvm->online_vcpus)) + inpages = sev_pin_memory_in_mmu(kvm, vaddr, size, &npages); + else + inpages = sev_pin_memory(kvm, vaddr, size, &npages, 1); + if (IS_ERR(inpages)) { + ret = PTR_ERR(inpages); goto e_free; } @@ -449,9 +554,11 @@ static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp) * The LAUNCH_UPDATE command will perform in-place encryption of the * memory content (i.e it will write the same memory region with C=1). * It's possible that the cache may contain the data with C=0, i.e., - * unencrypted so invalidate it first. + * unencrypted so invalidate it first. Flushing is automatically + * handled if the pages can be pinned in the MMU. */ - sev_clflush_pages(inpages, npages); + if (!atomic_read(&kvm->online_vcpus)) + sev_clflush_pages(inpages, npages); for (i = 0; vaddr < vaddr_end; vaddr = next_vaddr, i += pages) { int offset, len;