Message ID | 20220513195000.99371-2-seanjc@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86/mmu: nEPT X-only unsync bug fix | expand |
On Fri, May 13, 2022 at 12:50 PM Sean Christopherson <seanjc@google.com> wrote: > > Drop SPTEs whose new protections will yield a RWX=0 SPTE, i.e. a SPTE > that is marked shadow-present but is not-present in the page tables. If > EPT with execute-only support is in use by L1, KVM can create a RWX=0 > SPTE can be created for an EPTE if the upper level combined permissions > are R (or RW) and the leaf EPTE is changed from R (or RW) to X. For some reason I found this sentence hard to read. What about this: When shadowing EPT and NX HugePages is enabled, if the guest changes the permissions on a huge page in the EPT12 to be execute-only, KVM will end shadowing it with an RWX=0 SPTE in the EPT02 when it picks up the change in FNAME(sync_page). Note that the guest can't induce KVM to create a RWX=0 during FNAME(fetch), since the only valid way for the guest to fault in an execute-only huge page is with an instruction fetch, which KVM will handle by mapping the page as an executable 4KiB page. > Because > the EPTE is considered present when viewed in isolation, and no reserved > bits are set, FNAME(prefetch_invalid_gpte) will consider the GPTE valid. > > Creating a not-present SPTE isn't fatal as the SPTE is "correct" in the > sense that the guest translation is inaccesible (the combined protections > of all levels yield RWX=0), i.e. the guest won't get stuck in an infinite > loop. If EPT A/D bits are disabled, KVM can mistake the SPTE for an > access-tracked SPTE. But again, such confusion isn't fatal as the "saved" > protections are also RWX=0. > > Add a WARN in make_spte() to detect creation of SPTEs that will result in > RWX=0 protections, which is the real motivation for fixing ept_sync_page(). > Creating a useless SPTE means KVM messed up _something_, even if whatever > goof occurred doesn't manifest as a functional bug. > > Fixes: d95c55687e11 ("kvm: mmu: track read permission explicitly for shadow EPT page tables") > Cc: David Matlack <dmatlack@google.com> > Cc: Ben Gardon <bgardon@google.com> > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/x86/kvm/mmu/paging_tmpl.h | 9 ++++++++- > arch/x86/kvm/mmu/spte.c | 2 ++ > 2 files changed, 10 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h > index b025decf610d..d9f98f9ed4a0 100644 > --- a/arch/x86/kvm/mmu/paging_tmpl.h > +++ b/arch/x86/kvm/mmu/paging_tmpl.h > @@ -1052,7 +1052,14 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) > if (sync_mmio_spte(vcpu, &sp->spt[i], gfn, pte_access)) > continue; > > - if (gfn != sp->gfns[i]) { > + /* > + * Drop the SPTE if the new protections would result in a RWX=0 > + * SPTE or if the gfn is changing. The RWX=0 case only affects > + * EPT with execute-only support, i.e. EPT without an effective > + * "present" bit, as all other paging modes will create a > + * read-only SPTE if pte_access is zero. > + */ > + if ((!pte_access && !shadow_present_mask) || gfn != sp->gfns[i]) { > drop_spte(vcpu->kvm, &sp->spt[i]); > flush = true; > continue; > diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c > index 75c9e87d446a..9ad60662beac 100644 > --- a/arch/x86/kvm/mmu/spte.c > +++ b/arch/x86/kvm/mmu/spte.c > @@ -101,6 +101,8 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, > u64 spte = SPTE_MMU_PRESENT_MASK; > bool wrprot = false; > > + WARN_ON_ONCE(!pte_access && !shadow_present_mask); > + > if (sp->role.ad_disabled) > spte |= SPTE_TDP_AD_DISABLED_MASK; > else if (kvm_mmu_page_ad_need_write_protect(sp)) > -- > 2.36.0.550.gb090851708-goog >
On Fri, May 13, 2022, David Matlack wrote: > On Fri, May 13, 2022 at 12:50 PM Sean Christopherson <seanjc@google.com> wrote: > > > > Drop SPTEs whose new protections will yield a RWX=0 SPTE, i.e. a SPTE > > that is marked shadow-present but is not-present in the page tables. If > > EPT with execute-only support is in use by L1, KVM can create a RWX=0 > > SPTE can be created for an EPTE if the upper level combined permissions > > are R (or RW) and the leaf EPTE is changed from R (or RW) to X. > > For some reason I found this sentence hard to read. Heh, probably because "KVM can create a RWX=0 SPTE can be created" is nonsensical. I botched a late edit to the changelog... > What about this: > > When shadowing EPT and NX HugePages is enabled, if the guest changes This doesn' thave anything to do with NX HugePages, it's an execute-only specific bug where L1 can create a gPTE that is !READABLE but is considered PRESENT because it is EXECUTABLE. If the upper level protections are R or RW, the resulting protections for the entire translation are RWX=0. All of sync_page()'s existing checks filter out only !PRESENT gPTE, because without execute-only, all upper levels are guaranteed to be at least READABLE.
On Fri, May 13, 2022 at 5:56 PM Sean Christopherson <seanjc@google.com> wrote: > > On Fri, May 13, 2022, David Matlack wrote: > > On Fri, May 13, 2022 at 12:50 PM Sean Christopherson <seanjc@google.com> wrote: > > > > > > Drop SPTEs whose new protections will yield a RWX=0 SPTE, i.e. a SPTE > > > that is marked shadow-present but is not-present in the page tables. If > > > EPT with execute-only support is in use by L1, KVM can create a RWX=0 > > > SPTE can be created for an EPTE if the upper level combined permissions > > > are R (or RW) and the leaf EPTE is changed from R (or RW) to X. > > > > For some reason I found this sentence hard to read. > > Heh, probably because "KVM can create a RWX=0 SPTE can be created" is nonsensical. > I botched a late edit to the changelog... > > > What about this: > > > > When shadowing EPT and NX HugePages is enabled, if the guest changes > > This doesn' thave anything to do with NX HugePages, it's an execute-only specific > bug where L1 can create a gPTE that is !READABLE but is considered PRESENT because > it is EXECUTABLE. If the upper level protections are R or RW, the resulting > protections for the entire translation are RWX=0. All of sync_page()'s existing > checks filter out only !PRESENT gPTE, because without execute-only, all upper > levels are guaranteed to be at least READABLE. I see what you mean, thanks. And I also recall now you mentioned (off-list) that the NX HugePage scenario isn't possible because KVM does not let huge pages go unsync.
On Mon, May 16, 2022, David Matlack wrote: > On Fri, May 13, 2022 at 5:56 PM Sean Christopherson <seanjc@google.com> wrote: > > > > On Fri, May 13, 2022, David Matlack wrote: > > > On Fri, May 13, 2022 at 12:50 PM Sean Christopherson <seanjc@google.com> wrote: > > > > > > > > Drop SPTEs whose new protections will yield a RWX=0 SPTE, i.e. a SPTE > > > > that is marked shadow-present but is not-present in the page tables. If > > > > EPT with execute-only support is in use by L1, KVM can create a RWX=0 > > > > SPTE can be created for an EPTE if the upper level combined permissions > > > > are R (or RW) and the leaf EPTE is changed from R (or RW) to X. > > > > > > For some reason I found this sentence hard to read. > > > > Heh, probably because "KVM can create a RWX=0 SPTE can be created" is nonsensical. > > I botched a late edit to the changelog... > > > > > What about this: > > > > > > When shadowing EPT and NX HugePages is enabled, if the guest changes > > > > This doesn' thave anything to do with NX HugePages, it's an execute-only specific > > bug where L1 can create a gPTE that is !READABLE but is considered PRESENT because > > it is EXECUTABLE. If the upper level protections are R or RW, the resulting > > protections for the entire translation are RWX=0. All of sync_page()'s existing > > checks filter out only !PRESENT gPTE, because without execute-only, all upper > > levels are guaranteed to be at least READABLE. > > I see what you mean, thanks. > > And I also recall now you mentioned (off-list) that the NX HugePage > scenario isn't possible because KVM does not let huge pages go unsync. Yep. The other thing that's semi-relevant and I've mentioned off-list at least once is that our (Google's) old kernel has a different NX HugePage implementation that _can_ result in RWX=0 SPTEs. Unlike upstream, the internal NX HugePage implementation shatters a huge page _after_ installing said huge page, whereas upstream demotes the huge page before it's installed. If shattering fails on huge page that L1 created a huge page with just X permissions, KVM is left with a RWX=0 huge page.
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index b025decf610d..d9f98f9ed4a0 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -1052,7 +1052,14 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) if (sync_mmio_spte(vcpu, &sp->spt[i], gfn, pte_access)) continue; - if (gfn != sp->gfns[i]) { + /* + * Drop the SPTE if the new protections would result in a RWX=0 + * SPTE or if the gfn is changing. The RWX=0 case only affects + * EPT with execute-only support, i.e. EPT without an effective + * "present" bit, as all other paging modes will create a + * read-only SPTE if pte_access is zero. + */ + if ((!pte_access && !shadow_present_mask) || gfn != sp->gfns[i]) { drop_spte(vcpu->kvm, &sp->spt[i]); flush = true; continue; diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 75c9e87d446a..9ad60662beac 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -101,6 +101,8 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, u64 spte = SPTE_MMU_PRESENT_MASK; bool wrprot = false; + WARN_ON_ONCE(!pte_access && !shadow_present_mask); + if (sp->role.ad_disabled) spte |= SPTE_TDP_AD_DISABLED_MASK; else if (kvm_mmu_page_ad_need_write_protect(sp))
Drop SPTEs whose new protections will yield a RWX=0 SPTE, i.e. a SPTE that is marked shadow-present but is not-present in the page tables. If EPT with execute-only support is in use by L1, KVM can create a RWX=0 SPTE can be created for an EPTE if the upper level combined permissions are R (or RW) and the leaf EPTE is changed from R (or RW) to X. Because the EPTE is considered present when viewed in isolation, and no reserved bits are set, FNAME(prefetch_invalid_gpte) will consider the GPTE valid. Creating a not-present SPTE isn't fatal as the SPTE is "correct" in the sense that the guest translation is inaccesible (the combined protections of all levels yield RWX=0), i.e. the guest won't get stuck in an infinite loop. If EPT A/D bits are disabled, KVM can mistake the SPTE for an access-tracked SPTE. But again, such confusion isn't fatal as the "saved" protections are also RWX=0. Add a WARN in make_spte() to detect creation of SPTEs that will result in RWX=0 protections, which is the real motivation for fixing ept_sync_page(). Creating a useless SPTE means KVM messed up _something_, even if whatever goof occurred doesn't manifest as a functional bug. Fixes: d95c55687e11 ("kvm: mmu: track read permission explicitly for shadow EPT page tables") Cc: David Matlack <dmatlack@google.com> Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/kvm/mmu/paging_tmpl.h | 9 ++++++++- arch/x86/kvm/mmu/spte.c | 2 ++ 2 files changed, 10 insertions(+), 1 deletion(-)