Message ID | 20210812043630.2686-1-jiangshanlai@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] KVM: X86: Check pte present first in __shadow_walk_next() | expand |
On Thu, Aug 12, 2021, Lai Jiangshan wrote: > From: Lai Jiangshan <laijs@linux.alibaba.com> > > So far, the loop bodies already ensure the pte is present before calling > __shadow_walk_next(). But checking pte present in __shadow_walk_next() > is a more prudent way of programing and loop bodies will not need > to always check it. It allows us removing is_shadow_present_pte() > in the loop bodies. There needs to be more analysis in the changelog, as there are many more callers to __shadow_walk_next() than the three that are modified in the next patch. It might even make sense to squash the two patches together, i.e. make it a "move" instead of an "add + remove", and then explicitly explain why it's ok to add the check for paths that do _not_ currently have a !is_shadow_present_pte() in the loop body. Specifically, FNAME(fetch) via shadow_walk_next() and __direct_map() via for_each_shadow_entry() do not currently terminate their walks with a !PRESENT, but they get away with it because they install present non-leaf SPTEs in the loop itself. The other argument for the audit (changelog+patch) of all users is that the next patch misses FNAME(invlpg), e.g. @@ -977,7 +980,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa) FNAME(update_pte)(vcpu, sp, sptep, &gpte); } - if (!is_shadow_present_pte(*sptep) || !sp->unsync_children) + if (!sp->unsync_children) break; } write_unlock(&vcpu->kvm->mmu_lock); It would also be worthwhile to document via the changelog that terminating on !is_shadow_present_pte() is 100% the correct behavior, as walking past a !PRESENT SPTE would lead to attempting to read a the next level SPTE from a garbage iter->shadow_addr. And for clarity and safety, I think it would be worth adding the patch below as a prep patch to document and enforce that walking the non-leaf SPTEs when faulting in a page should never terminate early. From 1c202a7e82b1931e4eb37b23aa9d7108340a6cd2 Mon Sep 17 00:00:00 2001 From: Sean Christopherson <seanjc@google.com> Date: Thu, 12 Aug 2021 08:38:35 -0700 Subject: [PATCH] KVM: x86/mmu: Verify shadow walk doesn't terminate early in page faults WARN and bail if the shadow walk for faulting in a SPTE terminates early, i.e. doesn't reach the expected level because the walk encountered a terminal SPTE. The shadow walks for page faults are subtle in that they install non-leaf SPTEs (zapping leaf SPTEs if necessary!) in the loop body, and consume the newly created non-leaf SPTE in the loop control, e.g. __shadow_walk_next(). In other words, the walks guarantee that the walk will stop if and only if the target level is reached by installing non-leaf SPTEs to guarantee the walk remains valid. Opportunistically use fault->goal-level instead of it.level in FNAME(fetch) to further clarify that KVM always installs the leaf SPTE at the target level. Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/kvm/mmu/mmu.c | 3 +++ arch/x86/kvm/mmu/paging_tmpl.h | 7 +++++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index a272ccbddfa1..2a243ae1d64c 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2992,6 +2992,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) account_huge_nx_page(vcpu->kvm, sp); } + if (WARN_ON_ONCE(it.level != fault->goal_level)) + return -EFAULT; + ret = mmu_set_spte(vcpu, it.sptep, ACC_ALL, fault->write, fault->goal_level, base_gfn, fault->pfn, fault->prefault, fault->map_writable); diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index f70afecbf3a2..3a8a7b2f9979 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -749,9 +749,12 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, } } + if (WARN_ON_ONCE(it.level != fault->goal_level)) + return -EFAULT; + ret = mmu_set_spte(vcpu, it.sptep, gw->pte_access, fault->write, - it.level, base_gfn, fault->pfn, fault->prefault, - fault->map_writable); + fault->goal_level, base_gfn, fault->pfn, + fault->prefault, fault->map_writable); if (ret == RET_PF_SPURIOUS) return ret; -- 2.33.0.rc1.237.g0d66db33f3-goog
On 2021/8/13 00:03, Sean Christopherson wrote: > > And for clarity and safety, I think it would be worth adding the patch below as > a prep patch to document and enforce that walking the non-leaf SPTEs when faulting > in a page should never terminate early. I'v sent V2 patch which was updated as you suggested. The patch you provided below doesn't need to be updated. It and my V2 patch do not depend on each other, so I didn't resent your patch with mine together. For your patch Acked-by: Lai Jiangshan <jiangshanlai@gmail.com> And it can be queued earlier. > > > From 1c202a7e82b1931e4eb37b23aa9d7108340a6cd2 Mon Sep 17 00:00:00 2001 > From: Sean Christopherson <seanjc@google.com> > Date: Thu, 12 Aug 2021 08:38:35 -0700 > Subject: [PATCH] KVM: x86/mmu: Verify shadow walk doesn't terminate early in > page faults > > WARN and bail if the shadow walk for faulting in a SPTE terminates early, > i.e. doesn't reach the expected level because the walk encountered a > terminal SPTE. The shadow walks for page faults are subtle in that they > install non-leaf SPTEs (zapping leaf SPTEs if necessary!) in the loop > body, and consume the newly created non-leaf SPTE in the loop control, > e.g. __shadow_walk_next(). In other words, the walks guarantee that the > walk will stop if and only if the target level is reached by installing > non-leaf SPTEs to guarantee the walk remains valid. > > Opportunistically use fault->goal-level instead of it.level in > FNAME(fetch) to further clarify that KVM always installs the leaf SPTE at > the target level. > > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/x86/kvm/mmu/mmu.c | 3 +++ > arch/x86/kvm/mmu/paging_tmpl.h | 7 +++++-- > 2 files changed, 8 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index a272ccbddfa1..2a243ae1d64c 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -2992,6 +2992,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > account_huge_nx_page(vcpu->kvm, sp); > } > > + if (WARN_ON_ONCE(it.level != fault->goal_level)) > + return -EFAULT; > + > ret = mmu_set_spte(vcpu, it.sptep, ACC_ALL, > fault->write, fault->goal_level, base_gfn, fault->pfn, > fault->prefault, fault->map_writable); > diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h > index f70afecbf3a2..3a8a7b2f9979 100644 > --- a/arch/x86/kvm/mmu/paging_tmpl.h > +++ b/arch/x86/kvm/mmu/paging_tmpl.h > @@ -749,9 +749,12 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, > } > } > > + if (WARN_ON_ONCE(it.level != fault->goal_level)) > + return -EFAULT; > + > ret = mmu_set_spte(vcpu, it.sptep, gw->pte_access, fault->write, > - it.level, base_gfn, fault->pfn, fault->prefault, > - fault->map_writable); > + fault->goal_level, base_gfn, fault->pfn, > + fault->prefault, fault->map_writable); > if (ret == RET_PF_SPURIOUS) > return ret; > > -- > 2.33.0.rc1.237.g0d66db33f3-goog >
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index a272ccbddfa1..c48ecb25d5f8 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2231,7 +2231,7 @@ static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator) static void __shadow_walk_next(struct kvm_shadow_walk_iterator *iterator, u64 spte) { - if (is_last_spte(spte, iterator->level)) { + if (!is_shadow_present_pte(spte) || is_last_spte(spte, iterator->level)) { iterator->level = 0; return; }