diff mbox series

[V3,(resend),13/19] x86/setup: Do not create valid mappings when directmap=no

Message ID 20240513134046.82605-14-eliasely@amazon.com (mailing list archive)
State New
Headers show
Series Remove the directmap | expand

Commit Message

Elias El Yandouzi May 13, 2024, 1:40 p.m. UTC
From: Hongyan Xia <hongyxia@amazon.com>

Create empty mappings in the second e820 pass. Also, destroy existing
direct map mappings created in the first pass.

To make xenheap pages visible in guests, it is necessary to create empty
L3 tables in the direct map even when directmap=no, since guest cr3s
copy idle domain's L4 entries, which means they will share mappings in
the direct map if we pre-populate idle domain's L4 entries and L3
tables. A helper is introduced for this.

Also, after the direct map is actually gone, we need to stop updating
the direct map in update_xen_mappings().

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Elias El Yandouzi <eliasely@amazon.com>

Comments

Roger Pau Monné May 14, 2024, 3:39 p.m. UTC | #1
On Mon, May 13, 2024 at 01:40:40PM +0000, Elias El Yandouzi wrote:
> From: Hongyan Xia <hongyxia@amazon.com>
> 
> Create empty mappings in the second e820 pass. Also, destroy existing
> direct map mappings created in the first pass.
> 
> To make xenheap pages visible in guests, it is necessary to create empty
> L3 tables in the direct map even when directmap=no, since guest cr3s
> copy idle domain's L4 entries, which means they will share mappings in
> the direct map if we pre-populate idle domain's L4 entries and L3
> tables. A helper is introduced for this.
> 
> Also, after the direct map is actually gone, we need to stop updating
> the direct map in update_xen_mappings().
> 
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Elias El Yandouzi <eliasely@amazon.com>
> 
> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> index f26c9799e4..919347d8c2 100644
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -978,6 +978,57 @@ static struct domain *__init create_dom0(const module_t *image,
>  /* How much of the directmap is prebuilt at compile time. */
>  #define PREBUILT_MAP_LIMIT (1 << L2_PAGETABLE_SHIFT)
>  
> +/*
> + * This either populates a valid direct map, or allocates empty L3 tables and
> + * creates the L4 entries for virtual address between [start, end) in the
> + * direct map depending on has_directmap();
> + *
> + * When directmap=no, we still need to populate empty L3 tables in the
> + * direct map region. The reason is that on-demand xenheap mappings are
> + * created in the idle domain's page table but must be seen by
> + * everyone. Since all domains share the direct map L4 entries, they
> + * will share xenheap mappings if we pre-populate the L4 entries and L3
> + * tables in the direct map region for all RAM. We also rely on the fact
> + * that L3 tables are never freed.
> + */
> +static void __init populate_directmap(uint64_t pstart, uint64_t pend,

paddr_t for both.

> +                                      unsigned int flags)
> +{
> +    unsigned long vstart = (unsigned long)__va(pstart);
> +    unsigned long vend = (unsigned long)__va(pend);
> +
> +    if ( pstart >= pend )
> +        return;
> +
> +    BUG_ON(vstart < DIRECTMAP_VIRT_START);
> +    BUG_ON(vend > DIRECTMAP_VIRT_END);
> +
> +    if ( has_directmap() )
> +        /* Populate valid direct map. */
> +        BUG_ON(map_pages_to_xen(vstart, maddr_to_mfn(pstart),
> +                                PFN_DOWN(pend - pstart), flags));
> +    else
> +    {
> +        /* Create empty L3 tables. */
> +        unsigned long vaddr = vstart & ~((1UL << L4_PAGETABLE_SHIFT) - 1);
> +
> +        for ( ; vaddr < vend; vaddr += (1UL << L4_PAGETABLE_SHIFT) )

It might be clearer (by avoiding some of the bitops and masks to simply
do:

for ( unsigned int idx = l4_table_offset(vstart);
      idx <= l4_table_offset(vend);
      idx++ )
{
...

> +        {
> +            l4_pgentry_t *pl4e = &idle_pg_table[l4_table_offset(vaddr)];
> +
> +            if ( !(l4e_get_flags(*pl4e) & _PAGE_PRESENT) )
> +            {
> +                mfn_t mfn = alloc_boot_pages(1, 1);

Hm, why not use alloc_xen_pagetable()?

> +                void *v = map_domain_page(mfn);
> +
> +                clear_page(v);
> +                UNMAP_DOMAIN_PAGE(v);

Maybe use clear_domain_page()?

> +                l4e_write(pl4e, l4e_from_mfn(mfn, __PAGE_HYPERVISOR));
> +            }
> +        }
> +    }
> +}
> +
>  void asmlinkage __init noreturn __start_xen(unsigned long mbi_p)
>  {
>      const char *memmap_type = NULL, *loader, *cmdline = "";
> @@ -1601,8 +1652,17 @@ void asmlinkage __init noreturn __start_xen(unsigned long mbi_p)
>          map_e = min_t(uint64_t, e,
>                        ARRAY_SIZE(l2_directmap) << L2_PAGETABLE_SHIFT);
>  
> -        /* Pass mapped memory to allocator /before/ creating new mappings. */
> +        /*
> +         * Pass mapped memory to allocator /before/ creating new mappings.
> +         * The direct map for the bottom 4GiB has been populated in the first
> +         * e820 pass. In the second pass, we make sure those existing mappings
> +         * are destroyed when directmap=no.

Quite likely a stupid question, but why has the directmap been
populated for memory below 4GB?  IOW: why do we need to create those
mappings just to have them destroyed here.

Thanks, Roger.
Jan Beulich May 15, 2024, 3:50 p.m. UTC | #2
On 14.05.2024 17:39, Roger Pau Monné wrote:
> On Mon, May 13, 2024 at 01:40:40PM +0000, Elias El Yandouzi wrote:
>> +        {
>> +            l4_pgentry_t *pl4e = &idle_pg_table[l4_table_offset(vaddr)];
>> +
>> +            if ( !(l4e_get_flags(*pl4e) & _PAGE_PRESENT) )
>> +            {
>> +                mfn_t mfn = alloc_boot_pages(1, 1);
> 
> Hm, why not use alloc_xen_pagetable()?
> 
>> +                void *v = map_domain_page(mfn);
>> +
>> +                clear_page(v);
>> +                UNMAP_DOMAIN_PAGE(v);
> 
> Maybe use clear_domain_page()?

Or else use unmap_domain_page(). v is going out of scope just afterwards,
and UNMAP_DOMAIN_PAGE() is intended to be use when that's not the case.

Jan
Jan Beulich May 15, 2024, 3:59 p.m. UTC | #3
On 13.05.2024 15:40, Elias El Yandouzi wrote:
> From: Hongyan Xia <hongyxia@amazon.com>
> 
> Create empty mappings in the second e820 pass. Also, destroy existing
> direct map mappings created in the first pass.
> 
> To make xenheap pages visible in guests, it is necessary to create empty
> L3 tables in the direct map even when directmap=no, since guest cr3s
> copy idle domain's L4 entries, which means they will share mappings in
> the direct map if we pre-populate idle domain's L4 entries and L3
> tables. A helper is introduced for this.

Hmm. On one hand this may reduce memory consumption some, when large
ranges of MFNs aren't allocated as Xen heap pages. Otoh this increases
memory needs when Xen heap pages actually need mapping. I wonder whether
the (presumably less intrusive) change of merely altering permissions
from PAGE_HYPERVISOR to _PAGE_NONE|MAP_SMALL_PAGES wouldn't be better.

> Also, after the direct map is actually gone, we need to stop updating
> the direct map in update_xen_mappings().

What is this about? You only alter setup.c here.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index f26c9799e4..919347d8c2 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -978,6 +978,57 @@  static struct domain *__init create_dom0(const module_t *image,
 /* How much of the directmap is prebuilt at compile time. */
 #define PREBUILT_MAP_LIMIT (1 << L2_PAGETABLE_SHIFT)
 
+/*
+ * This either populates a valid direct map, or allocates empty L3 tables and
+ * creates the L4 entries for virtual address between [start, end) in the
+ * direct map depending on has_directmap();
+ *
+ * When directmap=no, we still need to populate empty L3 tables in the
+ * direct map region. The reason is that on-demand xenheap mappings are
+ * created in the idle domain's page table but must be seen by
+ * everyone. Since all domains share the direct map L4 entries, they
+ * will share xenheap mappings if we pre-populate the L4 entries and L3
+ * tables in the direct map region for all RAM. We also rely on the fact
+ * that L3 tables are never freed.
+ */
+static void __init populate_directmap(uint64_t pstart, uint64_t pend,
+                                      unsigned int flags)
+{
+    unsigned long vstart = (unsigned long)__va(pstart);
+    unsigned long vend = (unsigned long)__va(pend);
+
+    if ( pstart >= pend )
+        return;
+
+    BUG_ON(vstart < DIRECTMAP_VIRT_START);
+    BUG_ON(vend > DIRECTMAP_VIRT_END);
+
+    if ( has_directmap() )
+        /* Populate valid direct map. */
+        BUG_ON(map_pages_to_xen(vstart, maddr_to_mfn(pstart),
+                                PFN_DOWN(pend - pstart), flags));
+    else
+    {
+        /* Create empty L3 tables. */
+        unsigned long vaddr = vstart & ~((1UL << L4_PAGETABLE_SHIFT) - 1);
+
+        for ( ; vaddr < vend; vaddr += (1UL << L4_PAGETABLE_SHIFT) )
+        {
+            l4_pgentry_t *pl4e = &idle_pg_table[l4_table_offset(vaddr)];
+
+            if ( !(l4e_get_flags(*pl4e) & _PAGE_PRESENT) )
+            {
+                mfn_t mfn = alloc_boot_pages(1, 1);
+                void *v = map_domain_page(mfn);
+
+                clear_page(v);
+                UNMAP_DOMAIN_PAGE(v);
+                l4e_write(pl4e, l4e_from_mfn(mfn, __PAGE_HYPERVISOR));
+            }
+        }
+    }
+}
+
 void asmlinkage __init noreturn __start_xen(unsigned long mbi_p)
 {
     const char *memmap_type = NULL, *loader, *cmdline = "";
@@ -1601,8 +1652,17 @@  void asmlinkage __init noreturn __start_xen(unsigned long mbi_p)
         map_e = min_t(uint64_t, e,
                       ARRAY_SIZE(l2_directmap) << L2_PAGETABLE_SHIFT);
 
-        /* Pass mapped memory to allocator /before/ creating new mappings. */
+        /*
+         * Pass mapped memory to allocator /before/ creating new mappings.
+         * The direct map for the bottom 4GiB has been populated in the first
+         * e820 pass. In the second pass, we make sure those existing mappings
+         * are destroyed when directmap=no.
+         */
         init_boot_pages(s, min(map_s, e));
+        if ( !has_directmap() )
+            destroy_xen_mappings((unsigned long)__va(s),
+                                 (unsigned long)__va(min(map_s, e)));
+
         s = map_s;
         if ( s < map_e )
         {
@@ -1610,6 +1670,9 @@  void asmlinkage __init noreturn __start_xen(unsigned long mbi_p)
             map_s = (s + mask) & ~mask;
             map_e &= ~mask;
             init_boot_pages(map_s, map_e);
+            if ( !has_directmap() )
+                destroy_xen_mappings((unsigned long)__va(map_s),
+                                     (unsigned long)__va(map_e));
         }
 
         if ( map_s > map_e )
@@ -1623,8 +1686,7 @@  void asmlinkage __init noreturn __start_xen(unsigned long mbi_p)
 
             if ( map_e < end )
             {
-                map_pages_to_xen((unsigned long)__va(map_e), maddr_to_mfn(map_e),
-                                 PFN_DOWN(end - map_e), PAGE_HYPERVISOR);
+                populate_directmap(map_e, end, PAGE_HYPERVISOR);
                 init_boot_pages(map_e, end);
                 map_e = end;
             }
@@ -1633,13 +1695,11 @@  void asmlinkage __init noreturn __start_xen(unsigned long mbi_p)
         {
             /* This range must not be passed to the boot allocator and
              * must also not be mapped with _PAGE_GLOBAL. */
-            map_pages_to_xen((unsigned long)__va(map_e), maddr_to_mfn(map_e),
-                             PFN_DOWN(e - map_e), __PAGE_HYPERVISOR_RW);
+            populate_directmap(map_e, e, __PAGE_HYPERVISOR_RW);
         }
         if ( s < map_s )
         {
-            map_pages_to_xen((unsigned long)__va(s), maddr_to_mfn(s),
-                             PFN_DOWN(map_s - s), PAGE_HYPERVISOR);
+            populate_directmap(s, map_s, PAGE_HYPERVISOR);
             init_boot_pages(s, map_s);
         }
     }