Message ID | 20220420131204.2850-3-jiangshanlai@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: X86/MMU: Fix problem for shadowing 5-level NPT for 4-level NPT L1 guest | expand |
On 4/20/22 15:12, Lai Jiangshan wrote: > From: Lai Jiangshan <jiangshan.ljs@antgroup.com> > > When shadowing 5-level NPT for 4-level NPT L1 guest, the root_sp is > allocated with role.level = 5 and the guest pagetable's root gfn. > > And root_sp->spt[0] is also allocated with the same gfn and the same > role except role.level = 4. Luckily that they are different shadow > pages, but only root_sp->spt[0] is the real translation of the guest > pagetable. > > Here comes a problem: > > If the guest switches from gCR4_LA57=0 to gCR4_LA57=1 (or vice verse) > and uses the same gfn as the root page for nested NPT before and after > switching gCR4_LA57. The host (hCR4_LA57=1) might use the same root_sp > for the guest even the guest switches gCR4_LA57. The guest will see > unexpected page mapped and L2 may exploit the bug and hurt L1. It is > lucky that the problem can't hurt L0. > > And three special cases need to be handled: > > The root_sp should be like role.direct=1 sometimes: its contents are > not backed by gptes, root_sp->gfns is meaningless. (For a normal high > level sp in shadow paging, sp->gfns is often unused and kept zero, but > it could be relevant and meaningful if sp->gfns is used because they > are backed by concrete gptes.) > > For such root_sp in the case, root_sp is just a portal to contribute > root_sp->spt[0], and root_sp->gfns should not be used and > root_sp->spt[0] should not be dropped if gpte[0] of the guest root > pagetable is changed. > > Such root_sp should not be accounted too. > > So add role.passthrough to distinguish the shadow pages in the hash > when gCR4_LA57 is toggled and fix above special cases by using it in > kvm_mmu_page_{get|set}_gfn() and sp_has_gptes(). > > Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Looks good, and is easy to backport; thanks. However, once we have also your series "KVM: X86/MMU: Use one-off special shadow page for special roots", perhaps the passthrough sp can be converted to a special root (but now with a shadow_page) as well? Paolo > --- > Documentation/virt/kvm/mmu.rst | 3 +++ > arch/x86/include/asm/kvm_host.h | 5 +++-- > arch/x86/kvm/mmu/mmu.c | 16 ++++++++++++++++ > arch/x86/kvm/mmu/paging_tmpl.h | 1 + > 4 files changed, 23 insertions(+), 2 deletions(-) > > diff --git a/Documentation/virt/kvm/mmu.rst b/Documentation/virt/kvm/mmu.rst > index 5b1ebad24c77..4018b9d7a0d3 100644 > --- a/Documentation/virt/kvm/mmu.rst > +++ b/Documentation/virt/kvm/mmu.rst > @@ -202,6 +202,9 @@ Shadow pages contain the following information: > Is 1 if the MMU instance cannot use A/D bits. EPT did not have A/D > bits before Haswell; shadow EPT page tables also cannot use A/D bits > if the L1 hypervisor does not enable them. > + role.passthrough: > + Is 1 if role.level = 5 when shadowing 5-level shadow NPT for > + 4-level NPT L1. > gfn: > Either the guest page table containing the translations shadowed by this > page, or the base page frame for linear translations. See role.direct. > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 9694dd5e6ccc..d4f8f4784d87 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -285,7 +285,7 @@ struct kvm_kernel_irq_routing_entry; > * minimize the size of kvm_memory_slot.arch.gfn_track, i.e. allows allocating > * 2 bytes per gfn instead of 4 bytes per gfn. > * > - * Indirect upper-level shadow pages are tracked for write-protection via > + * Upper-level shadow pages having gptes are tracked for write-protection via > * gfn_track. As above, gfn_track is a 16 bit counter, so KVM must not create > * more than 2^16-1 upper-level shadow pages at a single gfn, otherwise > * gfn_track will overflow and explosions will ensure. > @@ -331,7 +331,8 @@ union kvm_mmu_page_role { > unsigned smap_andnot_wp:1; > unsigned ad_disabled:1; > unsigned guest_mode:1; > - unsigned :6; > + unsigned passthrough:1; > + unsigned :5; > > /* > * This is left at the top of the word so that > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 1bdff55218ef..d14cb6f99cb1 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -737,6 +737,9 @@ static void mmu_free_pte_list_desc(struct pte_list_desc *pte_list_desc) > > static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index) > { > + if (sp->role.passthrough) > + return sp->gfn; > + > if (!sp->role.direct) > return sp->gfns[index]; > > @@ -745,6 +748,11 @@ static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index) > > static void kvm_mmu_page_set_gfn(struct kvm_mmu_page *sp, int index, gfn_t gfn) > { > + if (sp->role.passthrough) { > + WARN_ON_ONCE(gfn != sp->gfn); > + return; > + } > + > if (!sp->role.direct) { > sp->gfns[index] = gfn; > return; > @@ -1861,6 +1869,9 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp) > if (sp->role.direct) > return false; > > + if (sp->role.passthrough) > + return false; > + > return true; > } > > @@ -2059,6 +2070,8 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, > quadrant &= (1 << ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1; > role.quadrant = quadrant; > } > + if (level <= vcpu->arch.mmu->root_level) > + role.passthrough = 0; > > sp_list = &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]; > for_each_valid_sp(vcpu->kvm, sp, sp_list) { > @@ -4890,6 +4903,9 @@ kvm_calc_shadow_npt_root_page_role(struct kvm_vcpu *vcpu, > > role.base.direct = false; > role.base.level = kvm_mmu_get_tdp_level(vcpu); > + if (role.base.level == PT64_ROOT_5LEVEL && > + role_regs_to_root_level(regs) == PT64_ROOT_4LEVEL) > + role.base.passthrough = 1; > > return role; > } > diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h > index 8621188b46df..c1b975fb85a2 100644 > --- a/arch/x86/kvm/mmu/paging_tmpl.h > +++ b/arch/x86/kvm/mmu/paging_tmpl.h > @@ -1042,6 +1042,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) > .level = 0xf, > .access = 0x7, > .quadrant = 0x3, > + .passthrough = 0x1, > }; > > /*
On Thu, Apr 21, 2022 at 1:15 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > > Looks good, and is easy to backport; thanks. > > However, once we have also your series "KVM: X86/MMU: Use one-off > special shadow page for special roots", perhaps the passthrough sp can > be converted to a special root (but now with a shadow_page) as well? > It will be much more complicated. The current code allows fast_pgd_switch() for shadowing 5-level NPT for 4-level NPT L1 guests because both the host and the guest are 64 bit. Using a special shadow page will disallow fast_pgd_switch(). And it will make using_special_root_page() more complicated. It will require a different kind of special shadow page to make it able to do fast switch.
diff --git a/Documentation/virt/kvm/mmu.rst b/Documentation/virt/kvm/mmu.rst index 5b1ebad24c77..4018b9d7a0d3 100644 --- a/Documentation/virt/kvm/mmu.rst +++ b/Documentation/virt/kvm/mmu.rst @@ -202,6 +202,9 @@ Shadow pages contain the following information: Is 1 if the MMU instance cannot use A/D bits. EPT did not have A/D bits before Haswell; shadow EPT page tables also cannot use A/D bits if the L1 hypervisor does not enable them. + role.passthrough: + Is 1 if role.level = 5 when shadowing 5-level shadow NPT for + 4-level NPT L1. gfn: Either the guest page table containing the translations shadowed by this page, or the base page frame for linear translations. See role.direct. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 9694dd5e6ccc..d4f8f4784d87 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -285,7 +285,7 @@ struct kvm_kernel_irq_routing_entry; * minimize the size of kvm_memory_slot.arch.gfn_track, i.e. allows allocating * 2 bytes per gfn instead of 4 bytes per gfn. * - * Indirect upper-level shadow pages are tracked for write-protection via + * Upper-level shadow pages having gptes are tracked for write-protection via * gfn_track. As above, gfn_track is a 16 bit counter, so KVM must not create * more than 2^16-1 upper-level shadow pages at a single gfn, otherwise * gfn_track will overflow and explosions will ensure. @@ -331,7 +331,8 @@ union kvm_mmu_page_role { unsigned smap_andnot_wp:1; unsigned ad_disabled:1; unsigned guest_mode:1; - unsigned :6; + unsigned passthrough:1; + unsigned :5; /* * This is left at the top of the word so that diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 1bdff55218ef..d14cb6f99cb1 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -737,6 +737,9 @@ static void mmu_free_pte_list_desc(struct pte_list_desc *pte_list_desc) static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index) { + if (sp->role.passthrough) + return sp->gfn; + if (!sp->role.direct) return sp->gfns[index]; @@ -745,6 +748,11 @@ static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index) static void kvm_mmu_page_set_gfn(struct kvm_mmu_page *sp, int index, gfn_t gfn) { + if (sp->role.passthrough) { + WARN_ON_ONCE(gfn != sp->gfn); + return; + } + if (!sp->role.direct) { sp->gfns[index] = gfn; return; @@ -1861,6 +1869,9 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp) if (sp->role.direct) return false; + if (sp->role.passthrough) + return false; + return true; } @@ -2059,6 +2070,8 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, quadrant &= (1 << ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1; role.quadrant = quadrant; } + if (level <= vcpu->arch.mmu->root_level) + role.passthrough = 0; sp_list = &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]; for_each_valid_sp(vcpu->kvm, sp, sp_list) { @@ -4890,6 +4903,9 @@ kvm_calc_shadow_npt_root_page_role(struct kvm_vcpu *vcpu, role.base.direct = false; role.base.level = kvm_mmu_get_tdp_level(vcpu); + if (role.base.level == PT64_ROOT_5LEVEL && + role_regs_to_root_level(regs) == PT64_ROOT_4LEVEL) + role.base.passthrough = 1; return role; } diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 8621188b46df..c1b975fb85a2 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -1042,6 +1042,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) .level = 0xf, .access = 0x7, .quadrant = 0x3, + .passthrough = 0x1, }; /*