Message ID | 20201211141615.12489-1-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Revert "x86/mm: drop guest_get_eff_l1e()" | expand |
On 11.12.2020 15:16, Andrew Cooper wrote: > This reverts commit 9ff9705647646aa937b5f5c1426a64c69a62b3bd. > > The change is only correct in the original context of XSA-286, where Xen's use > of the linear pagetables were dropped. However, performance problems > interfered with that plan, and XSA-286 was fixed differently. > > This broke Xen's lazy faulting of the LDT for 64bit PV guests when an access > was first encountered in user context. Xen would proceed to read the > registered LDT virtual address out of the user pagetables, not the kernel > pagetables. > > Given the nature of the bug, it would have also interfered with the IO > permisison bitmap functionality of userspace, which similarly needs to read > data using the kernel pagetables. This paragraph wants dropping afaict - guest_io_okay() has explicit calls to toggle_guest_pt(), and hence is unaffected by the bug here (and there is in particular page tables reading involved there). Then ... > Reported-by: Manuel Bouyer <bouyer@antioche.eu.org> > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > Tested-by: Manuel Bouyer <bouyer@antioche.eu.org> Reviewed-by: Jan Beulich <jbeulich@suse.com> I was wondering however whether we really want a full revert. I've locally made the change below as an alternative. Jan x86/PV: guest_get_eff_kern_l1e() may still need to switch page tables While indeed unnecessary for pv_ro_page_fault(), pv_map_ldt_shadow_page() may run when guest user mode is active, and hence may need to switch to the kernel page tables in order to retrieve an LDT page mapping. Fixes: 9ff970564764 ("x86/mm: drop guest_get_eff_l1e()") Reported-by: Manuel Bouyer <bouyer@antioche.eu.org> Signed-off-by: Jan Beulich <jbeulich@suse.com> --- This is the alternative to fully reverting the offending commit. I've also been considering to drop the paging-mode-translate ASSERT(), now that we always have translate == external. --- a/xen/arch/x86/pv/mm.h +++ b/xen/arch/x86/pv/mm.h @@ -11,10 +11,14 @@ int new_guest_cr3(mfn_t mfn); */ static inline l1_pgentry_t guest_get_eff_kern_l1e(unsigned long linear) { + struct vcpu *curr = current; l1_pgentry_t l1e; - ASSERT(!paging_mode_translate(current->domain)); - ASSERT(!paging_mode_external(current->domain)); + ASSERT(!paging_mode_translate(curr->domain)); + ASSERT(!paging_mode_external(curr->domain)); + + if ( !(curr->arch.flags & TF_kernel_mode) ) + toggle_guest_pt(curr); if ( unlikely(!__addr_ok(linear)) || __copy_from_user(&l1e, @@ -22,6 +26,9 @@ static inline l1_pgentry_t guest_get_eff sizeof(l1_pgentry_t)) ) l1e = l1e_empty(); + if ( !(curr->arch.flags & TF_kernel_mode) ) + toggle_guest_pt(curr); + return l1e; }
On 14/12/2020 08:27, Jan Beulich wrote: > On 11.12.2020 15:16, Andrew Cooper wrote: >> This reverts commit 9ff9705647646aa937b5f5c1426a64c69a62b3bd. >> >> The change is only correct in the original context of XSA-286, where Xen's use >> of the linear pagetables were dropped. However, performance problems >> interfered with that plan, and XSA-286 was fixed differently. >> >> This broke Xen's lazy faulting of the LDT for 64bit PV guests when an access >> was first encountered in user context. Xen would proceed to read the >> registered LDT virtual address out of the user pagetables, not the kernel >> pagetables. >> >> Given the nature of the bug, it would have also interfered with the IO >> permisison bitmap functionality of userspace, which similarly needs to read >> data using the kernel pagetables. > This paragraph wants dropping afaict - guest_io_okay() has > explicit calls to toggle_guest_pt(), and hence is unaffected by > the bug here (and there is in particular page tables reading > involved there). Then ... > >> Reported-by: Manuel Bouyer <bouyer@antioche.eu.org> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >> Tested-by: Manuel Bouyer <bouyer@antioche.eu.org> > Reviewed-by: Jan Beulich <jbeulich@suse.com> > > I was wondering however whether we really want a full revert. I've > locally made the change below as an alternative. > > Jan > > x86/PV: guest_get_eff_kern_l1e() may still need to switch page tables > > While indeed unnecessary for pv_ro_page_fault(), pv_map_ldt_shadow_page() > may run when guest user mode is active, and hence may need to switch to > the kernel page tables in order to retrieve an LDT page mapping. > > Fixes: 9ff970564764 ("x86/mm: drop guest_get_eff_l1e()") > Reported-by: Manuel Bouyer <bouyer@antioche.eu.org> > Signed-off-by: Jan Beulich <jbeulich@suse.com> > --- > This is the alternative to fully reverting the offending commit. Hmm yes - I think I prefer this, because we don't really want to keep the non-kern alternative. Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> however ... > I've also been considering to drop the paging-mode-translate ASSERT(), > now that we always have translate == external. I'd suggest not making that change here in this bugfix. I think we do want to alter how we do asserts like this, and there are other similarly impacted code blocks. > > --- a/xen/arch/x86/pv/mm.h > +++ b/xen/arch/x86/pv/mm.h > @@ -11,10 +11,14 @@ int new_guest_cr3(mfn_t mfn); > */ > static inline l1_pgentry_t guest_get_eff_kern_l1e(unsigned long linear) > { > + struct vcpu *curr = current; > l1_pgentry_t l1e; > > - ASSERT(!paging_mode_translate(current->domain)); > - ASSERT(!paging_mode_external(current->domain)); > + ASSERT(!paging_mode_translate(curr->domain)); > + ASSERT(!paging_mode_external(curr->domain)); > + > + if ( !(curr->arch.flags & TF_kernel_mode) ) ... pull this out into a variable, like the original code used to do. bool user_mode = !(curr->arch.flags & TF_kernel_mode); I've forgotten which static checker tripped up over this form, but one did IIRC. ~Andrew > + toggle_guest_pt(curr); > > if ( unlikely(!__addr_ok(linear)) || > __copy_from_user(&l1e, > @@ -22,6 +26,9 @@ static inline l1_pgentry_t guest_get_eff > sizeof(l1_pgentry_t)) ) > l1e = l1e_empty(); > > + if ( !(curr->arch.flags & TF_kernel_mode) ) > + toggle_guest_pt(curr); > + > return l1e; > } >
On 14.12.2020 14:21, Andrew Cooper wrote: > On 14/12/2020 08:27, Jan Beulich wrote: >> On 11.12.2020 15:16, Andrew Cooper wrote: >>> This reverts commit 9ff9705647646aa937b5f5c1426a64c69a62b3bd. >>> >>> The change is only correct in the original context of XSA-286, where Xen's use >>> of the linear pagetables were dropped. However, performance problems >>> interfered with that plan, and XSA-286 was fixed differently. >>> >>> This broke Xen's lazy faulting of the LDT for 64bit PV guests when an access >>> was first encountered in user context. Xen would proceed to read the >>> registered LDT virtual address out of the user pagetables, not the kernel >>> pagetables. >>> >>> Given the nature of the bug, it would have also interfered with the IO >>> permisison bitmap functionality of userspace, which similarly needs to read >>> data using the kernel pagetables. >> This paragraph wants dropping afaict - guest_io_okay() has >> explicit calls to toggle_guest_pt(), and hence is unaffected by >> the bug here (and there is in particular page tables reading >> involved there). Then ... >> >>> Reported-by: Manuel Bouyer <bouyer@antioche.eu.org> >>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >>> Tested-by: Manuel Bouyer <bouyer@antioche.eu.org> >> Reviewed-by: Jan Beulich <jbeulich@suse.com> >> >> I was wondering however whether we really want a full revert. I've >> locally made the change below as an alternative. >> >> Jan >> >> x86/PV: guest_get_eff_kern_l1e() may still need to switch page tables >> >> While indeed unnecessary for pv_ro_page_fault(), pv_map_ldt_shadow_page() >> may run when guest user mode is active, and hence may need to switch to >> the kernel page tables in order to retrieve an LDT page mapping. >> >> Fixes: 9ff970564764 ("x86/mm: drop guest_get_eff_l1e()") >> Reported-by: Manuel Bouyer <bouyer@antioche.eu.org> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> --- >> This is the alternative to fully reverting the offending commit. > > Hmm yes - I think I prefer this, because we don't really want to keep > the non-kern alternative. > > Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> however ... Thanks. >> I've also been considering to drop the paging-mode-translate ASSERT(), >> now that we always have translate == external. > > I'd suggest not making that change here in this bugfix. I think we do > want to alter how we do asserts like this, and there are other similarly > impacted code blocks. Okay, will look forward to learn what exactly you have in mind. >> --- a/xen/arch/x86/pv/mm.h >> +++ b/xen/arch/x86/pv/mm.h >> @@ -11,10 +11,14 @@ int new_guest_cr3(mfn_t mfn); >> */ >> static inline l1_pgentry_t guest_get_eff_kern_l1e(unsigned long linear) >> { >> + struct vcpu *curr = current; >> l1_pgentry_t l1e; >> >> - ASSERT(!paging_mode_translate(current->domain)); >> - ASSERT(!paging_mode_external(current->domain)); >> + ASSERT(!paging_mode_translate(curr->domain)); >> + ASSERT(!paging_mode_external(curr->domain)); >> + >> + if ( !(curr->arch.flags & TF_kernel_mode) ) > > ... pull this out into a variable, like the original code used to do. > > bool user_mode = !(curr->arch.flags & TF_kernel_mode); > > I've forgotten which static checker tripped up over this form, but one > did IIRC. I've made the change (will send the result in a minute), but I'm curious not so much which checker might have taken issue here but why. Jan
diff --git a/xen/arch/x86/pv/mm.c b/xen/arch/x86/pv/mm.c index 5d74d11cba..14cb0f2d4e 100644 --- a/xen/arch/x86/pv/mm.c +++ b/xen/arch/x86/pv/mm.c @@ -56,6 +56,27 @@ l1_pgentry_t *map_guest_l1e(unsigned long linear, mfn_t *gl1mfn) } /* + * Read the guest's l1e that maps this address, from the kernel-mode + * page tables. + */ +static l1_pgentry_t guest_get_eff_kern_l1e(unsigned long linear) +{ + struct vcpu *curr = current; + const bool user_mode = !(curr->arch.flags & TF_kernel_mode); + l1_pgentry_t l1e; + + if ( user_mode ) + toggle_guest_pt(curr); + + l1e = guest_get_eff_l1e(linear); + + if ( user_mode ) + toggle_guest_pt(curr); + + return l1e; +} + +/* * Map a guest's LDT page (covering the byte at @offset from start of the LDT) * into Xen's virtual range. Returns true if the mapping changed, false * otherwise. diff --git a/xen/arch/x86/pv/mm.h b/xen/arch/x86/pv/mm.h index 2a21859dd4..b1b66e46c8 100644 --- a/xen/arch/x86/pv/mm.h +++ b/xen/arch/x86/pv/mm.h @@ -5,11 +5,8 @@ l1_pgentry_t *map_guest_l1e(unsigned long linear, mfn_t *gl1mfn); int new_guest_cr3(mfn_t mfn); -/* - * Read the guest's l1e that maps this address, from the kernel-mode - * page tables. - */ -static inline l1_pgentry_t guest_get_eff_kern_l1e(unsigned long linear) +/* Read a PV guest's l1e that maps this linear address. */ +static inline l1_pgentry_t guest_get_eff_l1e(unsigned long linear) { l1_pgentry_t l1e; diff --git a/xen/arch/x86/pv/ro-page-fault.c b/xen/arch/x86/pv/ro-page-fault.c index 8d0007ede5..7f6fbc92fb 100644 --- a/xen/arch/x86/pv/ro-page-fault.c +++ b/xen/arch/x86/pv/ro-page-fault.c @@ -342,7 +342,7 @@ int pv_ro_page_fault(unsigned long addr, struct cpu_user_regs *regs) bool mmio_ro; /* Attempt to read the PTE that maps the VA being accessed. */ - pte = guest_get_eff_kern_l1e(addr); + pte = guest_get_eff_l1e(addr); /* We are only looking for read-only mappings */ if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT | _PAGE_RW)) != _PAGE_PRESENT) )