[v2,1/2] x86/mm: factor out the code for shattering an l3 PTE
diff mbox series

Message ID 3375af1e708b4ec3205f493a17da6e0369249096.1575891620.git.hongyxia@amazon.com
State Superseded
Headers show
Series
  • Refactor super page shattering
Related show

Commit Message

Xia, Hongyan Dec. 9, 2019, 11:48 a.m. UTC
map_pages_to_xen and modify_xen_mappings are performing almost exactly
the same operations when shattering an l3 PTE, the only difference
being whether we want to flush.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

---
Changes in v2:
- improve asm.
- re-read pl3e from memory when taking the lock.
- move the allocation of l2t inside the shatter function.
---
 xen/arch/x86/mm.c | 97 +++++++++++++++++++++++------------------------
 1 file changed, 48 insertions(+), 49 deletions(-)

Comments

Jan Beulich Dec. 10, 2019, 3:20 p.m. UTC | #1
On 09.12.2019 12:48, Hongyan Xia wrote:
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -5151,6 +5151,51 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
>                           flush_area_local((const void *)v, f) : \
>                           flush_area_all((const void *)v, f))
>  
> +/* Shatter an l3 entry and populate l2. If virt is passed in, also do flush. */
> +static int shatter_l3e(l3_pgentry_t *pl3e, unsigned long virt, bool locking)
> +{
> +    unsigned int i;
> +    l3_pgentry_t ol3e;
> +    l2_pgentry_t ol2e, *l2t = alloc_xen_pagetable();
> +
> +    if ( l2t == NULL )

Nowadays we seem to prefer !l2t in cases like this one.

> +        return -1;

-ENOMEM please (and then handed on by the caller).

> +    ol3e = *pl3e;

This could be the variable's initializer.

> +    ol2e = l2e_from_intpte(l3e_get_intpte(ol3e));

There's nothing "old" about this L2 entry, so its name would better
be just "l2e" I think.

> +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
> +    {
> +        l2e_write(l2t + i, ol2e);
> +        ol2e = l2e_from_intpte(
> +                l2e_get_intpte(ol2e) + (1 << (PAGETABLE_ORDER + PAGE_SHIFT)));

Indentation looks odd here (also further down). If the first argument
of a function call doesn't fit on the line and would also be ugly to
split across lines, what we do is indent it the usual 4 characters
from the function invocation, i.e. in this case

        ol2e = l2e_from_intpte(
                   l2e_get_intpte(ol2e) + (1 << (PAGETABLE_ORDER + PAGE_SHIFT)));

and then slightly shorter

        ol2e = l2e_from_intpte(
                   l2e_get_intpte(ol2e) + (PAGE_SIZE << PAGETABLE_ORDER));

Of course, as mentioned before, I'm not overly happy to see type
safety lost in case like this one, where it's not needed like e.g.
further up to convert from L3 to L2 entry.

> +    }
> +    if ( locking )
> +        spin_lock(&map_pgdir_lock);
> +    if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
> +         (l3e_get_flags(*pl3e) & _PAGE_PSE) )
> +    {
> +        l3e_write_atomic(pl3e,
> +                l3e_from_paddr((paddr_t)virt_to_maddr(l2t), __PAGE_HYPERVISOR));
> +        l2t = NULL;
> +    }
> +    if ( locking )
> +        spin_unlock(&map_pgdir_lock);
> +    if ( virt )
> +    {
> +        unsigned int flush_flags =
> +            FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
> +
> +        if ( (l3e_get_flags(ol3e) & _PAGE_GLOBAL) )

Unnecessary pair of parentheses (which also wasn't there in the
original code).

> +                flush_flags |= FLUSH_TLB_GLOBAL;

Too deep indentation.

> +        flush_area(virt, flush_flags);
> +    }
> +    if ( l2t )
> +        free_xen_pagetable(l2t);
> +
> +    return 0;
> +}

Also please add blank lines between
- L2 population and lock acquire,
- lock release and TLB flush,
- TLB flush and free.

Jan
Xia, Hongyan Dec. 11, 2019, 10:28 a.m. UTC | #2
On Tue, 2019-12-10 at 16:20 +0100, Jan Beulich wrote:
> 
>         ol2e = l2e_from_intpte(
>                    l2e_get_intpte(ol2e) + (PAGE_SIZE <<
> PAGETABLE_ORDER));
> 
> Of course, as mentioned before, I'm not overly happy to see type
> safety lost in case like this one, where it's not needed like e.g.
> further up to convert from L3 to L2 entry.
> 

Okay, so I did a comparison between the efficiency of the assembly
under a release build.

The old "type-safe" way requires 16 instructions to prepare the first
l2e, and each iteration of the inner loop of populating l2t requires 7
instructions.

The new type-unsafe way requires 6 to prepare the first l2e, and each
iteration of populating l2t takes 5 instructions.

So the difference of populating l2t is 3600 vs. 2566 instructions,
which is not very small.

I have not tested the packed bit field way you suggested, but I think
it could even be higher than 3600 due to masking, shifting and also
overflow handling.

Hongyan
Xia, Hongyan Dec. 11, 2019, 11:02 a.m. UTC | #3
On Tue, 2019-12-10 at 16:20 +0100, Jan Beulich wrote:
> On 09.12.2019 12:48, Hongyan Xia wrote:
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -5151,6 +5151,51 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long
> > v)
> >                           flush_area_local((const void *)v, f) : \
> >                           flush_area_all((const void *)v, f))
> >  
> > +/* Shatter an l3 entry and populate l2. If virt is passed in, also
> > do flush. */
> > +static int shatter_l3e(l3_pgentry_t *pl3e, unsigned long virt,
> > bool locking)
> > +{
> > +    unsigned int i;
> > +    l3_pgentry_t ol3e;
> > +    l2_pgentry_t ol2e, *l2t = alloc_xen_pagetable();
> > +
> > +    if ( l2t == NULL )
> 
> Nowadays we seem to prefer !l2t in cases like this one.
> 
> > +        return -1;
> 
> -ENOMEM please (and then handed on by the caller).
> 
> > +    ol3e = *pl3e;
> 
> This could be the variable's initializer.
> 
> > +    ol2e = l2e_from_intpte(l3e_get_intpte(ol3e));
> 
> There's nothing "old" about this L2 entry, so its name would better
> be just "l2e" I think.
> 
> > +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
> > +    {
> > +        l2e_write(l2t + i, ol2e);
> > +        ol2e = l2e_from_intpte(
> > +                l2e_get_intpte(ol2e) + (1 << (PAGETABLE_ORDER +
> > PAGE_SHIFT)));
> 
> Indentation looks odd here (also further down). If the first argument
> of a function call doesn't fit on the line and would also be ugly to
> split across lines, what we do is indent it the usual 4 characters
> from the function invocation, i.e. in this case
> 
>         ol2e = l2e_from_intpte(
>                    l2e_get_intpte(ol2e) + (1 << (PAGETABLE_ORDER +
> PAGE_SHIFT)));
> 
> and then slightly shorter
> 
>         ol2e = l2e_from_intpte(
>                    l2e_get_intpte(ol2e) + (PAGE_SIZE <<
> PAGETABLE_ORDER));
> 
> Of course, as mentioned before, I'm not overly happy to see type
> safety lost in case like this one, where it's not needed like e.g.
> further up to convert from L3 to L2 entry.
> 
> > +    }
> > +    if ( locking )
> > +        spin_lock(&map_pgdir_lock);
> > +    if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
> > +         (l3e_get_flags(*pl3e) & _PAGE_PSE) )
> > +    {
> > +        l3e_write_atomic(pl3e,
> > +                l3e_from_paddr((paddr_t)virt_to_maddr(l2t),
> > __PAGE_HYPERVISOR));
> > +        l2t = NULL;
> > +    }
> > +    if ( locking )
> > +        spin_unlock(&map_pgdir_lock);
> > +    if ( virt )
> > +    {
> > +        unsigned int flush_flags =
> > +            FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
> > +
> > +        if ( (l3e_get_flags(ol3e) & _PAGE_GLOBAL) )
> 
> Unnecessary pair of parentheses (which also wasn't there in the
> original code).
> 
> > +                flush_flags |= FLUSH_TLB_GLOBAL;
> 
> Too deep indentation.
> 
> > +        flush_area(virt, flush_flags);
> > +    }
> > +    if ( l2t )
> > +        free_xen_pagetable(l2t);
> > +
> > +    return 0;
> > +}
> 
> Also please add blank lines between
> - L2 population and lock acquire,
> - lock release and TLB flush,
> - TLB flush and free.
> 
> Jan

Issues fixed in v3. I have not touched the type safety part. If we
think this is really important we can revert to what it was before,
although from the quick study I did in my previous email, there is a
performance difference.

Hongyan
Julien Grall Dec. 11, 2019, 11:10 a.m. UTC | #4
Hi Hongyan,

On 11/12/2019 10:28, Xia, Hongyan wrote:
> On Tue, 2019-12-10 at 16:20 +0100, Jan Beulich wrote:
>>
>>          ol2e = l2e_from_intpte(
>>                     l2e_get_intpte(ol2e) + (PAGE_SIZE <<
>> PAGETABLE_ORDER));
>>
>> Of course, as mentioned before, I'm not overly happy to see type
>> safety lost in case like this one, where it's not needed like e.g.
>> further up to convert from L3 to L2 entry.
>>
> 
> Okay, so I did a comparison between the efficiency of the assembly
> under a release build.
> 
> The old "type-safe" way requires 16 instructions to prepare the first
> l2e, and each iteration of the inner loop of populating l2t requires 7
> instructions.
> 
> The new type-unsafe way requires 6 to prepare the first l2e, and each
> iteration of populating l2t takes 5 instructions.
> 
> So the difference of populating l2t is 3600 vs. 2566 instructions,
> which is not very small.
While this involves more instructions, how often do we expect the code 
to be called?

Cheers,
Xia, Hongyan Dec. 11, 2019, 11:28 a.m. UTC | #5
On Wed, 2019-12-11 at 11:10 +0000, Julien Grall wrote:
> Hi Hongyan,
> ...
> 
> While this involves more instructions, how often do we expect the
> code 
> to be called?
> 
> Cheers,
> 

I don't expect this to be called very often in the current Xen.
Although with direct map removal, a lot of the memory allocations
(mostly xenheap allocations) will be mapped and unmapped on-demand and
there is a much higher change of merging/shattering.

However, the series moved all PTEs from xenheap to domheap, and we
might see other things moved to domheap in the future, so we might not
have many things left on xenheap anyway.

Hongyan
Julien Grall Dec. 11, 2019, 11:47 a.m. UTC | #6
Hi Hongyan,

On 11/12/2019 11:28, Xia, Hongyan wrote:
> On Wed, 2019-12-11 at 11:10 +0000, Julien Grall wrote:
>> Hi Hongyan,
>> ...
>>
>> While this involves more instructions, how often do we expect the
>> code
>> to be called?
>>
>> Cheers,
>>
> 
> I don't expect this to be called very often in the current Xen.
> Although with direct map removal, a lot of the memory allocations
> (mostly xenheap allocations) will be mapped and unmapped on-demand and
> there is a much higher change of merging/shattering.

Thank you for the explanation. In order to merge/shatter, you need the 
buddy allocator to ensure all the xenheap are allocated contiguously. I 
am pretty unconvinced there are a lot of page allocated via xenheap. So 
the chance to have contiguous xenheap allocation is very limited.

But the merging/shattering can be counterproductive. An example short 
memory allocation (we have a few places like that):
    xmalloc(...);
    do something
    xfree(...);

We would end up to merge and then a few ms later shatter again. So it 
feels to me, that merging is probably not worth it (I am planning to 
discuss with Andrew today about it).

> 
> However, the series moved all PTEs from xenheap to domheap, and we
> might see other things moved to domheap in the future, so we might not
> have many things left on xenheap anyway.

Typesafe is an important part of making our code base more secure than 
basic C (such as not mixing type).

In this case, I think if we start to merge/shatter a lot, then we have a 
bigger problem and we may want to consider to remove it (see above) So 
it feels the optimization is not worth it.

Note that I am not maintaining this code, so the final call is on Andrew 
and Jan.

Cheers,

Patch
diff mbox series

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 7d4dd80a85..6188a968ff 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5151,6 +5151,51 @@  l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
                          flush_area_local((const void *)v, f) : \
                          flush_area_all((const void *)v, f))
 
+/* Shatter an l3 entry and populate l2. If virt is passed in, also do flush. */
+static int shatter_l3e(l3_pgentry_t *pl3e, unsigned long virt, bool locking)
+{
+    unsigned int i;
+    l3_pgentry_t ol3e;
+    l2_pgentry_t ol2e, *l2t = alloc_xen_pagetable();
+
+    if ( l2t == NULL )
+        return -1;
+
+    ol3e = *pl3e;
+    ol2e = l2e_from_intpte(l3e_get_intpte(ol3e));
+
+    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
+    {
+        l2e_write(l2t + i, ol2e);
+        ol2e = l2e_from_intpte(
+                l2e_get_intpte(ol2e) + (1 << (PAGETABLE_ORDER + PAGE_SHIFT)));
+    }
+    if ( locking )
+        spin_lock(&map_pgdir_lock);
+    if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
+         (l3e_get_flags(*pl3e) & _PAGE_PSE) )
+    {
+        l3e_write_atomic(pl3e,
+                l3e_from_paddr((paddr_t)virt_to_maddr(l2t), __PAGE_HYPERVISOR));
+        l2t = NULL;
+    }
+    if ( locking )
+        spin_unlock(&map_pgdir_lock);
+    if ( virt )
+    {
+        unsigned int flush_flags =
+            FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
+
+        if ( (l3e_get_flags(ol3e) & _PAGE_GLOBAL) )
+                flush_flags |= FLUSH_TLB_GLOBAL;
+        flush_area(virt, flush_flags);
+    }
+    if ( l2t )
+        free_xen_pagetable(l2t);
+
+    return 0;
+}
+
 int map_pages_to_xen(
     unsigned long virt,
     mfn_t mfn,
@@ -5244,9 +5289,6 @@  int map_pages_to_xen(
         if ( (l3e_get_flags(ol3e) & _PAGE_PRESENT) &&
              (l3e_get_flags(ol3e) & _PAGE_PSE) )
         {
-            unsigned int flush_flags =
-                FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
-
             /* Skip this PTE if there is no change. */
             if ( ((l3e_get_pfn(ol3e) & ~(L2_PAGETABLE_ENTRIES *
                                          L1_PAGETABLE_ENTRIES - 1)) +
@@ -5267,33 +5309,9 @@  int map_pages_to_xen(
                 continue;
             }
 
-            pl2e = alloc_xen_pagetable();
-            if ( pl2e == NULL )
+            /* Pass virt to indicate we need to flush. */
+            if ( shatter_l3e(pl3e, virt, locking) )
                 return -ENOMEM;
-
-            for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
-                l2e_write(pl2e + i,
-                          l2e_from_pfn(l3e_get_pfn(ol3e) +
-                                       (i << PAGETABLE_ORDER),
-                                       l3e_get_flags(ol3e)));
-
-            if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
-                flush_flags |= FLUSH_TLB_GLOBAL;
-
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
-            if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
-                 (l3e_get_flags(*pl3e) & _PAGE_PSE) )
-            {
-                l3e_write_atomic(pl3e, l3e_from_mfn(virt_to_mfn(pl2e),
-                                                    __PAGE_HYPERVISOR));
-                pl2e = NULL;
-            }
-            if ( locking )
-                spin_unlock(&map_pgdir_lock);
-            flush_area(virt, flush_flags);
-            if ( pl2e )
-                free_xen_pagetable(pl2e);
         }
 
         pl2e = virt_to_xen_l2e(virt);
@@ -5578,27 +5596,8 @@  int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
             }
 
             /* PAGE1GB: shatter the superpage and fall through. */
-            pl2e = alloc_xen_pagetable();
-            if ( !pl2e )
+            if ( shatter_l3e(pl3e, 0, locking) )
                 return -ENOMEM;
-            for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
-                l2e_write(pl2e + i,
-                          l2e_from_pfn(l3e_get_pfn(*pl3e) +
-                                       (i << PAGETABLE_ORDER),
-                                       l3e_get_flags(*pl3e)));
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
-            if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
-                 (l3e_get_flags(*pl3e) & _PAGE_PSE) )
-            {
-                l3e_write_atomic(pl3e, l3e_from_mfn(virt_to_mfn(pl2e),
-                                                    __PAGE_HYPERVISOR));
-                pl2e = NULL;
-            }
-            if ( locking )
-                spin_unlock(&map_pgdir_lock);
-            if ( pl2e )
-                free_xen_pagetable(pl2e);
         }
 
         /*