diff mbox

x86/mm: also flush TLB when putting writable foreign page reference

Message ID 58F8EF4D0200007800152882@prv-mh.provo.novell.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jan Beulich April 20, 2017, 3:26 p.m. UTC
Jann's explanation of the problem:

"start situation:
 - domain A and domain B are PV domains
 - domain A and B both have currently scheduled vCPUs, and the vCPUs
   are not scheduled away
 - domain A has XSM_TARGET access to domain B
 - page X is owned by domain B and has no mappings
 - page X is zeroed

 steps:
 - domain A uses do_mmu_update() to map page X in domain A as writable
 - domain A accesses page X through the new PTE, creating a TLB entry
 - domain A removes its mapping of page X
   - type count of page X goes to 0
   - tlbflush_timestamp of page X is bumped
 - domain B maps page X as L1 pagetable
   - type of page X changes to PGT_l1_page_table
   - TLB flush is forced using domain_dirty_cpumask of domain B
   - page X is mapped as L1 pagetable in domain B

 At this point, domain B's vCPUs are guaranteed to have no
 incorrectly-typed stale TLB entries for page X, but AFAICS domain A's
 vCPUs can still have stale TLB entries that map page X as writable,
 permitting domain A to control a live pagetable of domain B."

Domain A necessarily is Dom0 (DomU-s with XSM_TARGET permission are
being created only for HVM domains, but domain B needs to be PV here),
so this is not a security issue, but nevertheless seems desirable to
correct.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
x86/mm: also flush TLB when putting writable foreign page reference

Jann's explanation of the problem:

"start situation:
 - domain A and domain B are PV domains
 - domain A and B both have currently scheduled vCPUs, and the vCPUs
   are not scheduled away
 - domain A has XSM_TARGET access to domain B
 - page X is owned by domain B and has no mappings
 - page X is zeroed

 steps:
 - domain A uses do_mmu_update() to map page X in domain A as writable
 - domain A accesses page X through the new PTE, creating a TLB entry
 - domain A removes its mapping of page X
   - type count of page X goes to 0
   - tlbflush_timestamp of page X is bumped
 - domain B maps page X as L1 pagetable
   - type of page X changes to PGT_l1_page_table
   - TLB flush is forced using domain_dirty_cpumask of domain B
   - page X is mapped as L1 pagetable in domain B

 At this point, domain B's vCPUs are guaranteed to have no
 incorrectly-typed stale TLB entries for page X, but AFAICS domain A's
 vCPUs can still have stale TLB entries that map page X as writable,
 permitting domain A to control a live pagetable of domain B."

Domain A necessarily is Dom0 (DomU-s with XSM_TARGET permission are
being created only for HVM domains, but domain B needs to be PV here),
so this is not a security issue, but nevertheless seems desirable to
correct.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -602,6 +602,20 @@ static inline void guest_get_eff_kern_l1
     TOGGLE_MODE();
 }
 
+static const cpumask_t *get_flush_tlb_mask(const struct page_info *page,
+                                           const struct domain *d)
+{
+    cpumask_t *mask = this_cpu(scratch_cpumask);
+
+    BUG_ON(in_irq());
+    cpumask_copy(mask, d->domain_dirty_cpumask);
+
+    /* Don't flush if the timestamp is old enough */
+    tlbflush_filter(mask, page->tlbflush_timestamp);
+
+    return mask;
+}
+
 const char __section(".bss.page_aligned.const") __aligned(PAGE_SIZE)
     zero_page[PAGE_SIZE];
 
@@ -1266,6 +1280,23 @@ void put_page_from_l1e(l1_pgentry_t l1e,
     if ( (l1e_get_flags(l1e) & _PAGE_RW) && 
          ((l1e_owner == pg_owner) || !paging_mode_external(pg_owner)) )
     {
+        /*
+         * Don't leave stale writable TLB entries in the unmapping domain's
+         * page tables, to prevent them allowing access to pages required to
+         * be read-only (e.g. after pg_owner changed them to page table or
+         * segment descriptor pages).
+         */
+        if ( unlikely(l1e_owner != pg_owner) )
+        {
+            const cpumask_t *mask = get_flush_tlb_mask(page, l1e_owner);
+
+            if ( !cpumask_empty(mask) )
+            {
+                perfc_incr(need_flush_tlb_flush);
+                flush_tlb_mask(mask);
+            }
+        }
+
         put_page_and_type(page);
     }
     else
@@ -2545,13 +2576,7 @@ static int __get_page_type(struct page_i
                  * may be unnecessary (e.g., page was GDT/LDT) but those 
                  * circumstances should be very rare.
                  */
-                cpumask_t *mask = this_cpu(scratch_cpumask);
-
-                BUG_ON(in_irq());
-                cpumask_copy(mask, d->domain_dirty_cpumask);
-
-                /* Don't flush if the timestamp is old enough */
-                tlbflush_filter(mask, page->tlbflush_timestamp);
+                const cpumask_t *mask = get_flush_tlb_mask(page, d);
 
                 if ( unlikely(!cpumask_empty(mask)) &&
                      /* Shadow mode: track only writable pages. */

Comments

Jann Horn April 20, 2017, 4:05 p.m. UTC | #1
On Thu, Apr 20, 2017 at 5:26 PM, Jan Beulich <JBeulich@suse.com> wrote:
> Jann's explanation of the problem:
>
> "start situation:
>  - domain A and domain B are PV domains
>  - domain A and B both have currently scheduled vCPUs, and the vCPUs
>    are not scheduled away
>  - domain A has XSM_TARGET access to domain B
>  - page X is owned by domain B and has no mappings
>  - page X is zeroed
>
>  steps:
>  - domain A uses do_mmu_update() to map page X in domain A as writable
>  - domain A accesses page X through the new PTE, creating a TLB entry
>  - domain A removes its mapping of page X
>    - type count of page X goes to 0
>    - tlbflush_timestamp of page X is bumped
>  - domain B maps page X as L1 pagetable
>    - type of page X changes to PGT_l1_page_table
>    - TLB flush is forced using domain_dirty_cpumask of domain B
>    - page X is mapped as L1 pagetable in domain B
>
>  At this point, domain B's vCPUs are guaranteed to have no
>  incorrectly-typed stale TLB entries for page X, but AFAICS domain A's
>  vCPUs can still have stale TLB entries that map page X as writable,
>  permitting domain A to control a live pagetable of domain B."
>
> Domain A necessarily is Dom0 (DomU-s with XSM_TARGET permission are
> being created only for HVM domains, but domain B needs to be PV here),
> so this is not a security issue, but nevertheless seems desirable to
> correct.
>
> Reported-by: Jann Horn <jannh@google.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -602,6 +602,20 @@ static inline void guest_get_eff_kern_l1
>      TOGGLE_MODE();
>  }
>
> +static const cpumask_t *get_flush_tlb_mask(const struct page_info *page,
> +                                           const struct domain *d)
> +{
> +    cpumask_t *mask = this_cpu(scratch_cpumask);
> +
> +    BUG_ON(in_irq());
> +    cpumask_copy(mask, d->domain_dirty_cpumask);
> +
> +    /* Don't flush if the timestamp is old enough */
> +    tlbflush_filter(mask, page->tlbflush_timestamp);
> +
> +    return mask;
> +}
> +
>  const char __section(".bss.page_aligned.const") __aligned(PAGE_SIZE)
>      zero_page[PAGE_SIZE];
>
> @@ -1266,6 +1280,23 @@ void put_page_from_l1e(l1_pgentry_t l1e,
>      if ( (l1e_get_flags(l1e) & _PAGE_RW) &&
>           ((l1e_owner == pg_owner) || !paging_mode_external(pg_owner)) )
>      {
> +        /*
> +         * Don't leave stale writable TLB entries in the unmapping domain's
> +         * page tables, to prevent them allowing access to pages required to
> +         * be read-only (e.g. after pg_owner changed them to page table or
> +         * segment descriptor pages).
> +         */
> +        if ( unlikely(l1e_owner != pg_owner) )
> +        {
> +            const cpumask_t *mask = get_flush_tlb_mask(page, l1e_owner);
> +
> +            if ( !cpumask_empty(mask) )
> +            {
> +                perfc_incr(need_flush_tlb_flush);
> +                flush_tlb_mask(mask);
> +            }
> +        }

Why does this use a flush masked with page->tlbflush_timestamp?
Shouldn't it force an unconditional flush on the whole domain, similar to
gnttab_flush_tlb()?

Also: I think the same issue might apply to readonly mappings where
the page is released back to the dom heap afterwards. The current fix
only covers writable pages.
diff mbox

Patch

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -602,6 +602,20 @@  static inline void guest_get_eff_kern_l1
     TOGGLE_MODE();
 }
 
+static const cpumask_t *get_flush_tlb_mask(const struct page_info *page,
+                                           const struct domain *d)
+{
+    cpumask_t *mask = this_cpu(scratch_cpumask);
+
+    BUG_ON(in_irq());
+    cpumask_copy(mask, d->domain_dirty_cpumask);
+
+    /* Don't flush if the timestamp is old enough */
+    tlbflush_filter(mask, page->tlbflush_timestamp);
+
+    return mask;
+}
+
 const char __section(".bss.page_aligned.const") __aligned(PAGE_SIZE)
     zero_page[PAGE_SIZE];
 
@@ -1266,6 +1280,23 @@  void put_page_from_l1e(l1_pgentry_t l1e,
     if ( (l1e_get_flags(l1e) & _PAGE_RW) && 
          ((l1e_owner == pg_owner) || !paging_mode_external(pg_owner)) )
     {
+        /*
+         * Don't leave stale writable TLB entries in the unmapping domain's
+         * page tables, to prevent them allowing access to pages required to
+         * be read-only (e.g. after pg_owner changed them to page table or
+         * segment descriptor pages).
+         */
+        if ( unlikely(l1e_owner != pg_owner) )
+        {
+            const cpumask_t *mask = get_flush_tlb_mask(page, l1e_owner);
+
+            if ( !cpumask_empty(mask) )
+            {
+                perfc_incr(need_flush_tlb_flush);
+                flush_tlb_mask(mask);
+            }
+        }
+
         put_page_and_type(page);
     }
     else
@@ -2545,13 +2576,7 @@  static int __get_page_type(struct page_i
                  * may be unnecessary (e.g., page was GDT/LDT) but those 
                  * circumstances should be very rare.
                  */
-                cpumask_t *mask = this_cpu(scratch_cpumask);
-
-                BUG_ON(in_irq());
-                cpumask_copy(mask, d->domain_dirty_cpumask);
-
-                /* Don't flush if the timestamp is old enough */
-                tlbflush_filter(mask, page->tlbflush_timestamp);
+                const cpumask_t *mask = get_flush_tlb_mask(page, d);
 
                 if ( unlikely(!cpumask_empty(mask)) &&
                      /* Shadow mode: track only writable pages. */