Message ID | 20090403214548.GA5394@amt.cnet (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Marcelo Tosatti wrote: > On Tue, Mar 24, 2009 at 11:47:33AM +0200, Avi Kivity wrote: > >>> index 2ea8262..48169d7 100644 >>> --- a/arch/x86/kvm/x86.c >>> +++ b/arch/x86/kvm/x86.c >>> @@ -3109,6 +3109,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) >>> kvm_write_guest_time(vcpu); >>> if (test_and_clear_bit(KVM_REQ_MMU_SYNC, &vcpu->requests)) >>> kvm_mmu_sync_roots(vcpu); >>> + if (test_and_clear_bit(KVM_REQ_MMU_GLOBAL_SYNC, &vcpu->requests)) >>> + kvm_mmu_sync_global(vcpu); >>> if (test_and_clear_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests)) >>> kvm_x86_ops->tlb_flush(vcpu); >>> if (test_and_clear_bit(KVM_REQ_REPORT_TPR_ACCESS >>> >> Windows will (I think) write a PDE on every context switch, so this >> effectively disables global unsync for that guest. >> >> What about recursively syncing the newly linked page in FNAME(fetch)()? >> If the page isn't global, this becomes a no-op, so no new overhead. The >> only question is the expense when linking a populated top-level page, >> especially in long mode. >> > > How about this? > > KVM: MMU: sync global pages on fetch() > > If an unsync global page becomes unreachable via the shadow tree, which > can happen if one its parent pages is zapped, invlpg will fail to > invalidate translations for gvas contained in such unreachable pages. > > So sync global pages in fetch(). > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > > diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h > index 09782a9..728be72 100644 > --- a/arch/x86/kvm/paging_tmpl.h > +++ b/arch/x86/kvm/paging_tmpl.h > @@ -308,8 +308,14 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, > break; > } > > - if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) > + if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) { > + if (level-1 == PT_PAGE_TABLE_LEVEL) { > + shadow_page = page_header(__pa(sptep)); > + if (shadow_page->unsync && shadow_page->global) > + kvm_sync_page(vcpu, shadow_page); > + } > continue; > + } > > if (is_large_pte(*sptep)) { > rmap_remove(vcpu->kvm, sptep); > But here the shadow page is already linked? Isn't the root cause that an invlpg was called when the page wasn't linked, so it wasn't seen by invlpg? So I thought the best place would be in fetch(), after kvm_mmu_get_page(). If we're linking a page which contains global ptes, they might be unsynced due to invlpgs that we've missed. Or am I missing something about the root cause?
On Sat, Apr 04, 2009 at 01:37:39PM +0300, Avi Kivity wrote: >> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h >> index 09782a9..728be72 100644 >> --- a/arch/x86/kvm/paging_tmpl.h >> +++ b/arch/x86/kvm/paging_tmpl.h >> @@ -308,8 +308,14 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, >> break; >> } >> - if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) >> + if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) { >> + if (level-1 == PT_PAGE_TABLE_LEVEL) { >> + shadow_page = page_header(__pa(sptep)); >> + if (shadow_page->unsync && shadow_page->global) >> + kvm_sync_page(vcpu, shadow_page); >> + } >> continue; >> + } >> if (is_large_pte(*sptep)) { >> rmap_remove(vcpu->kvm, sptep); >> > > But here the shadow page is already linked? Isn't the root cause that > an invlpg was called when the page wasn't linked, so it wasn't seen by > invlpg? > > So I thought the best place would be in fetch(), after > kvm_mmu_get_page(). If we're linking a page which contains global ptes, > they might be unsynced due to invlpgs that we've missed. > > Or am I missing something about the root cause? The problem is when the page is unreachable due to a higher level path being unlinked. Say: level 4 -> level 3 . level 2 -> level 1 (global unsync) The dot there means level 3 is not linked to level 2, so invlpg can't reach the global unsync at level 1. kvm_mmu_get_page does sync pages when it finds them, so the code is already safe for the "linking a page which contains global ptes" case you mention above. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 03, 2009 at 06:45:48PM -0300, Marcelo Tosatti wrote: > On Tue, Mar 24, 2009 at 11:47:33AM +0200, Avi Kivity wrote: > >> index 2ea8262..48169d7 100644 > >> --- a/arch/x86/kvm/x86.c > >> +++ b/arch/x86/kvm/x86.c > >> @@ -3109,6 +3109,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) > >> kvm_write_guest_time(vcpu); > >> if (test_and_clear_bit(KVM_REQ_MMU_SYNC, &vcpu->requests)) > >> kvm_mmu_sync_roots(vcpu); > >> + if (test_and_clear_bit(KVM_REQ_MMU_GLOBAL_SYNC, &vcpu->requests)) > >> + kvm_mmu_sync_global(vcpu); > >> if (test_and_clear_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests)) > >> kvm_x86_ops->tlb_flush(vcpu); > >> if (test_and_clear_bit(KVM_REQ_REPORT_TPR_ACCESS > > > > Windows will (I think) write a PDE on every context switch, so this > > effectively disables global unsync for that guest. > > > > What about recursively syncing the newly linked page in FNAME(fetch)()? > > If the page isn't global, this becomes a no-op, so no new overhead. The > > only question is the expense when linking a populated top-level page, > > especially in long mode. > > How about this? > > KVM: MMU: sync global pages on fetch() > > If an unsync global page becomes unreachable via the shadow tree, which > can happen if one its parent pages is zapped, invlpg will fail to > invalidate translations for gvas contained in such unreachable pages. > > So sync global pages in fetch(). > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> I have tried this patch, and unfortunately it does not solve the original problem, while the previous one did.
Marcelo Tosatti wrote: > The problem is when the page is unreachable due to a higher level path > being unlinked. Say: > > level 4 -> level 3 . level 2 -> level 1 (global unsync) > > The dot there means level 3 is not linked to level 2, so invlpg can't > reach the global unsync at level 1. > > kvm_mmu_get_page does sync pages when it finds them, so the code is > already safe for the "linking a page which contains global ptes" case > you mention above. > The recursive resync ignores global pages (or it would hit them on cr3 switch too). But we have a bigger problem, invlpg can miss even if nothing is unlinked: access address x through global pte -> pte instantiated into spte switch to cr3 where x is not mapped, or mapped differently write to pte -> no fault since pte is unsync invlpg x -> no way this can hit the spte switch cr3 back access address x -> use old spte Here's one way to make this work: - add a hash of global pagetables, indexed by virtual address instead of the pagetable's gfn - invlpg checks this hash in addition to the recursive walk We'd need to make the virtual address part of sp->role to avoid needing to link the same page multiple times in the virtual address hash.
On Sun, Apr 05, 2009 at 11:41:39AM +0300, Avi Kivity wrote: > Marcelo Tosatti wrote: >> The problem is when the page is unreachable due to a higher level path >> being unlinked. Say: >> >> level 4 -> level 3 . level 2 -> level 1 (global unsync) >> >> The dot there means level 3 is not linked to level 2, so invlpg can't >> reach the global unsync at level 1. >> >> kvm_mmu_get_page does sync pages when it finds them, so the code is >> already safe for the "linking a page which contains global ptes" case >> you mention above. >> > > The recursive resync ignores global pages (or it would hit them on cr3 > switch too). > > But we have a bigger problem, invlpg can miss even if nothing is unlinked: > > access address x through global pte > -> pte instantiated into spte > switch to cr3 where x is not mapped, or mapped differently > write to pte > -> no fault since pte is unsync > invlpg x > -> no way this can hit the spte > switch cr3 back > access address x > -> use old spte > > Here's one way to make this work: > > - add a hash of global pagetables, indexed by virtual address instead > of the pagetable's gfn > - invlpg checks this hash in addition to the recursive walk > > We'd need to make the virtual address part of sp->role to avoid needing > to link the same page multiple times in the virtual address hash. Humpf, yes. It seems its too expensive/complex to handle this, for such small gain (~= 2% on AIM7 with RHEL3 guest). Are you okay with just disabling the global pages optimization? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marcelo Tosatti wrote: >> Here's one way to make this work: >> >> - add a hash of global pagetables, indexed by virtual address instead >> of the pagetable's gfn >> - invlpg checks this hash in addition to the recursive walk >> >> We'd need to make the virtual address part of sp->role to avoid needing >> to link the same page multiple times in the virtual address hash. >> > > Humpf, yes. It seems its too expensive/complex to handle this, for such > small gain (~= 2% on AIM7 with RHEL3 guest). > > Are you okay with just disabling the global pages optimization? > Definitely to plug the hole; and probably for later as well, unless people cry out due to regressions. Please send it in two patches: one a trivial one to disable global page detection which can be sent to -stable as well, and a follow on which rips out the global page machinery until (and if) we decide to reimplement it correctly.
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 09782a9..728be72 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -308,8 +308,14 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, break; } - if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) + if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) { + if (level-1 == PT_PAGE_TABLE_LEVEL) { + shadow_page = page_header(__pa(sptep)); + if (shadow_page->unsync && shadow_page->global) + kvm_sync_page(vcpu, shadow_page); + } continue; + } if (is_large_pte(*sptep)) { rmap_remove(vcpu->kvm, sptep);