diff mbox series

Revert "x86/mm: drop guest_get_eff_l1e()"

Message ID 20201211141615.12489-1-andrew.cooper3@citrix.com (mailing list archive)
State New
Headers show
Series Revert "x86/mm: drop guest_get_eff_l1e()" | expand

Commit Message

Andrew Cooper Dec. 11, 2020, 2:16 p.m. UTC
This reverts commit 9ff9705647646aa937b5f5c1426a64c69a62b3bd.

The change is only correct in the original context of XSA-286, where Xen's use
of the linear pagetables were dropped.  However, performance problems
interfered with that plan, and XSA-286 was fixed differently.

This broke Xen's lazy faulting of the LDT for 64bit PV guests when an access
was first encountered in user context.  Xen would proceed to read the
registered LDT virtual address out of the user pagetables, not the kernel
pagetables.

Given the nature of the bug, it would have also interfered with the IO
permisison bitmap functionality of userspace, which similarly needs to read
data using the kernel pagetables.

Reported-by: Manuel Bouyer <bouyer@antioche.eu.org>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Manuel Bouyer <bouyer@antioche.eu.org>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Manuel Bouyer <bouyer@antioche.eu.org>

There is also a bug with Xen's IRET handling, but that has been broken forever
and is much more complicated to fix.  I'll put it on my TODO list, but no idea
when I'll get around to addressing it.
---
 xen/arch/x86/pv/mm.c            | 21 +++++++++++++++++++++
 xen/arch/x86/pv/mm.h            |  7 ++-----
 xen/arch/x86/pv/ro-page-fault.c |  2 +-
 3 files changed, 24 insertions(+), 6 deletions(-)

Comments

Jan Beulich Dec. 14, 2020, 8:27 a.m. UTC | #1
On 11.12.2020 15:16, Andrew Cooper wrote:
> This reverts commit 9ff9705647646aa937b5f5c1426a64c69a62b3bd.
> 
> The change is only correct in the original context of XSA-286, where Xen's use
> of the linear pagetables were dropped.  However, performance problems
> interfered with that plan, and XSA-286 was fixed differently.
> 
> This broke Xen's lazy faulting of the LDT for 64bit PV guests when an access
> was first encountered in user context.  Xen would proceed to read the
> registered LDT virtual address out of the user pagetables, not the kernel
> pagetables.
> 
> Given the nature of the bug, it would have also interfered with the IO
> permisison bitmap functionality of userspace, which similarly needs to read
> data using the kernel pagetables.

This paragraph wants dropping afaict - guest_io_okay() has
explicit calls to toggle_guest_pt(), and hence is unaffected by
the bug here (and there is in particular page tables reading
involved there). Then ...

> Reported-by: Manuel Bouyer <bouyer@antioche.eu.org>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Tested-by: Manuel Bouyer <bouyer@antioche.eu.org>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

I was wondering however whether we really want a full revert. I've
locally made the change below as an alternative.

Jan

x86/PV: guest_get_eff_kern_l1e() may still need to switch page tables

While indeed unnecessary for pv_ro_page_fault(), pv_map_ldt_shadow_page()
may run when guest user mode is active, and hence may need to switch to
the kernel page tables in order to retrieve an LDT page mapping.

Fixes: 9ff970564764 ("x86/mm: drop guest_get_eff_l1e()")
Reported-by: Manuel Bouyer <bouyer@antioche.eu.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
This is the alternative to fully reverting the offending commit.

I've also been considering to drop the paging-mode-translate ASSERT(),
now that we always have translate == external.

--- a/xen/arch/x86/pv/mm.h
+++ b/xen/arch/x86/pv/mm.h
@@ -11,10 +11,14 @@ int new_guest_cr3(mfn_t mfn);
  */
 static inline l1_pgentry_t guest_get_eff_kern_l1e(unsigned long linear)
 {
+    struct vcpu *curr = current;
     l1_pgentry_t l1e;
 
-    ASSERT(!paging_mode_translate(current->domain));
-    ASSERT(!paging_mode_external(current->domain));
+    ASSERT(!paging_mode_translate(curr->domain));
+    ASSERT(!paging_mode_external(curr->domain));
+
+    if ( !(curr->arch.flags & TF_kernel_mode) )
+        toggle_guest_pt(curr);
 
     if ( unlikely(!__addr_ok(linear)) ||
          __copy_from_user(&l1e,
@@ -22,6 +26,9 @@ static inline l1_pgentry_t guest_get_eff
                           sizeof(l1_pgentry_t)) )
         l1e = l1e_empty();
 
+    if ( !(curr->arch.flags & TF_kernel_mode) )
+        toggle_guest_pt(curr);
+
     return l1e;
 }
Andrew Cooper Dec. 14, 2020, 1:21 p.m. UTC | #2
On 14/12/2020 08:27, Jan Beulich wrote:
> On 11.12.2020 15:16, Andrew Cooper wrote:
>> This reverts commit 9ff9705647646aa937b5f5c1426a64c69a62b3bd.
>>
>> The change is only correct in the original context of XSA-286, where Xen's use
>> of the linear pagetables were dropped.  However, performance problems
>> interfered with that plan, and XSA-286 was fixed differently.
>>
>> This broke Xen's lazy faulting of the LDT for 64bit PV guests when an access
>> was first encountered in user context.  Xen would proceed to read the
>> registered LDT virtual address out of the user pagetables, not the kernel
>> pagetables.
>>
>> Given the nature of the bug, it would have also interfered with the IO
>> permisison bitmap functionality of userspace, which similarly needs to read
>> data using the kernel pagetables.
> This paragraph wants dropping afaict - guest_io_okay() has
> explicit calls to toggle_guest_pt(), and hence is unaffected by
> the bug here (and there is in particular page tables reading
> involved there). Then ...
>
>> Reported-by: Manuel Bouyer <bouyer@antioche.eu.org>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Tested-by: Manuel Bouyer <bouyer@antioche.eu.org>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>
> I was wondering however whether we really want a full revert. I've
> locally made the change below as an alternative.
>
> Jan
>
> x86/PV: guest_get_eff_kern_l1e() may still need to switch page tables
>
> While indeed unnecessary for pv_ro_page_fault(), pv_map_ldt_shadow_page()
> may run when guest user mode is active, and hence may need to switch to
> the kernel page tables in order to retrieve an LDT page mapping.
>
> Fixes: 9ff970564764 ("x86/mm: drop guest_get_eff_l1e()")
> Reported-by: Manuel Bouyer <bouyer@antioche.eu.org>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> This is the alternative to fully reverting the offending commit.

Hmm yes - I think I prefer this, because we don't really want to keep
the non-kern alternative.

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> however ...

> I've also been considering to drop the paging-mode-translate ASSERT(),
> now that we always have translate == external.

I'd suggest not making that change here in this bugfix.  I think we do
want to alter how we do asserts like this, and there are other similarly
impacted code blocks.

>
> --- a/xen/arch/x86/pv/mm.h
> +++ b/xen/arch/x86/pv/mm.h
> @@ -11,10 +11,14 @@ int new_guest_cr3(mfn_t mfn);
>   */
>  static inline l1_pgentry_t guest_get_eff_kern_l1e(unsigned long linear)
>  {
> +    struct vcpu *curr = current;
>      l1_pgentry_t l1e;
>  
> -    ASSERT(!paging_mode_translate(current->domain));
> -    ASSERT(!paging_mode_external(current->domain));
> +    ASSERT(!paging_mode_translate(curr->domain));
> +    ASSERT(!paging_mode_external(curr->domain));
> +
> +    if ( !(curr->arch.flags & TF_kernel_mode) )

... pull this out into a variable, like the original code used to do.

bool user_mode = !(curr->arch.flags & TF_kernel_mode);

I've forgotten which static checker tripped up over this form, but one
did IIRC.

~Andrew

> +        toggle_guest_pt(curr);
>  
>      if ( unlikely(!__addr_ok(linear)) ||
>           __copy_from_user(&l1e,
> @@ -22,6 +26,9 @@ static inline l1_pgentry_t guest_get_eff
>                            sizeof(l1_pgentry_t)) )
>          l1e = l1e_empty();
>  
> +    if ( !(curr->arch.flags & TF_kernel_mode) )
> +        toggle_guest_pt(curr);
> +
>      return l1e;
>  }
>
Jan Beulich Dec. 14, 2020, 1:55 p.m. UTC | #3
On 14.12.2020 14:21, Andrew Cooper wrote:
> On 14/12/2020 08:27, Jan Beulich wrote:
>> On 11.12.2020 15:16, Andrew Cooper wrote:
>>> This reverts commit 9ff9705647646aa937b5f5c1426a64c69a62b3bd.
>>>
>>> The change is only correct in the original context of XSA-286, where Xen's use
>>> of the linear pagetables were dropped.  However, performance problems
>>> interfered with that plan, and XSA-286 was fixed differently.
>>>
>>> This broke Xen's lazy faulting of the LDT for 64bit PV guests when an access
>>> was first encountered in user context.  Xen would proceed to read the
>>> registered LDT virtual address out of the user pagetables, not the kernel
>>> pagetables.
>>>
>>> Given the nature of the bug, it would have also interfered with the IO
>>> permisison bitmap functionality of userspace, which similarly needs to read
>>> data using the kernel pagetables.
>> This paragraph wants dropping afaict - guest_io_okay() has
>> explicit calls to toggle_guest_pt(), and hence is unaffected by
>> the bug here (and there is in particular page tables reading
>> involved there). Then ...
>>
>>> Reported-by: Manuel Bouyer <bouyer@antioche.eu.org>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Tested-by: Manuel Bouyer <bouyer@antioche.eu.org>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>
>> I was wondering however whether we really want a full revert. I've
>> locally made the change below as an alternative.
>>
>> Jan
>>
>> x86/PV: guest_get_eff_kern_l1e() may still need to switch page tables
>>
>> While indeed unnecessary for pv_ro_page_fault(), pv_map_ldt_shadow_page()
>> may run when guest user mode is active, and hence may need to switch to
>> the kernel page tables in order to retrieve an LDT page mapping.
>>
>> Fixes: 9ff970564764 ("x86/mm: drop guest_get_eff_l1e()")
>> Reported-by: Manuel Bouyer <bouyer@antioche.eu.org>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> This is the alternative to fully reverting the offending commit.
> 
> Hmm yes - I think I prefer this, because we don't really want to keep
> the non-kern alternative.
> 
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> however ...

Thanks.

>> I've also been considering to drop the paging-mode-translate ASSERT(),
>> now that we always have translate == external.
> 
> I'd suggest not making that change here in this bugfix.  I think we do
> want to alter how we do asserts like this, and there are other similarly
> impacted code blocks.

Okay, will look forward to learn what exactly you have in mind.

>> --- a/xen/arch/x86/pv/mm.h
>> +++ b/xen/arch/x86/pv/mm.h
>> @@ -11,10 +11,14 @@ int new_guest_cr3(mfn_t mfn);
>>   */
>>  static inline l1_pgentry_t guest_get_eff_kern_l1e(unsigned long linear)
>>  {
>> +    struct vcpu *curr = current;
>>      l1_pgentry_t l1e;
>>  
>> -    ASSERT(!paging_mode_translate(current->domain));
>> -    ASSERT(!paging_mode_external(current->domain));
>> +    ASSERT(!paging_mode_translate(curr->domain));
>> +    ASSERT(!paging_mode_external(curr->domain));
>> +
>> +    if ( !(curr->arch.flags & TF_kernel_mode) )
> 
> ... pull this out into a variable, like the original code used to do.
> 
> bool user_mode = !(curr->arch.flags & TF_kernel_mode);
> 
> I've forgotten which static checker tripped up over this form, but one
> did IIRC.

I've made the change (will send the result in a minute), but I'm curious
not so much which checker might have taken issue here but why.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/pv/mm.c b/xen/arch/x86/pv/mm.c
index 5d74d11cba..14cb0f2d4e 100644
--- a/xen/arch/x86/pv/mm.c
+++ b/xen/arch/x86/pv/mm.c
@@ -56,6 +56,27 @@  l1_pgentry_t *map_guest_l1e(unsigned long linear, mfn_t *gl1mfn)
 }
 
 /*
+ * Read the guest's l1e that maps this address, from the kernel-mode
+ * page tables.
+ */
+static l1_pgentry_t guest_get_eff_kern_l1e(unsigned long linear)
+{
+    struct vcpu *curr = current;
+    const bool user_mode = !(curr->arch.flags & TF_kernel_mode);
+    l1_pgentry_t l1e;
+
+    if ( user_mode )
+        toggle_guest_pt(curr);
+
+    l1e = guest_get_eff_l1e(linear);
+
+    if ( user_mode )
+        toggle_guest_pt(curr);
+
+    return l1e;
+}
+
+/*
  * Map a guest's LDT page (covering the byte at @offset from start of the LDT)
  * into Xen's virtual range.  Returns true if the mapping changed, false
  * otherwise.
diff --git a/xen/arch/x86/pv/mm.h b/xen/arch/x86/pv/mm.h
index 2a21859dd4..b1b66e46c8 100644
--- a/xen/arch/x86/pv/mm.h
+++ b/xen/arch/x86/pv/mm.h
@@ -5,11 +5,8 @@  l1_pgentry_t *map_guest_l1e(unsigned long linear, mfn_t *gl1mfn);
 
 int new_guest_cr3(mfn_t mfn);
 
-/*
- * Read the guest's l1e that maps this address, from the kernel-mode
- * page tables.
- */
-static inline l1_pgentry_t guest_get_eff_kern_l1e(unsigned long linear)
+/* Read a PV guest's l1e that maps this linear address. */
+static inline l1_pgentry_t guest_get_eff_l1e(unsigned long linear)
 {
     l1_pgentry_t l1e;
 
diff --git a/xen/arch/x86/pv/ro-page-fault.c b/xen/arch/x86/pv/ro-page-fault.c
index 8d0007ede5..7f6fbc92fb 100644
--- a/xen/arch/x86/pv/ro-page-fault.c
+++ b/xen/arch/x86/pv/ro-page-fault.c
@@ -342,7 +342,7 @@  int pv_ro_page_fault(unsigned long addr, struct cpu_user_regs *regs)
     bool mmio_ro;
 
     /* Attempt to read the PTE that maps the VA being accessed. */
-    pte = guest_get_eff_kern_l1e(addr);
+    pte = guest_get_eff_l1e(addr);
 
     /* We are only looking for read-only mappings */
     if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT | _PAGE_RW)) != _PAGE_PRESENT) )