diff mbox series

[v4,1/2] x86/mm: factor out the code for shattering an l3 PTE

Message ID 77acf62cab293ae99bd1fc079e1b0853faaf1242.1576154413.git.hongyxia@amazon.com (mailing list archive)
State New, archived
Headers show
Series Refactor super page shattering | expand

Commit Message

Xia, Hongyan Dec. 12, 2019, 12:46 p.m. UTC
map_pages_to_xen and modify_xen_mappings are performing almost exactly
the same operations when shattering an l3 PTE, the only difference
being whether we want to flush.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

---
Changes in v4:
- use false/true instead of -1/0 to indicate failure/success.
- remove unnecessary cast.

Changes in v3:
- style and indentation changes.
- return -ENOMEM instead of -1.

Changes in v2:
- improve asm.
- re-read pl3e from memory when taking the lock.
- move the allocation of l2t inside the shatter function.
---
 xen/arch/x86/mm.c | 98 +++++++++++++++++++++++------------------------
 1 file changed, 49 insertions(+), 49 deletions(-)

Comments

Andrew Cooper Dec. 13, 2019, 2:19 p.m. UTC | #1
On 12/12/2019 12:46, Hongyan Xia wrote:
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index 7d4dd80a85..8def4fb8d8 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -5151,6 +5151,52 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
>                           flush_area_local((const void *)v, f) : \
>                           flush_area_all((const void *)v, f))
>  
> +/* Shatter an l3 entry and populate l2. If virt is passed in, also do flush. */
> +static bool shatter_l3e(l3_pgentry_t *pl3e, unsigned long virt, bool locking)
> +{
> +    unsigned int i;
> +    l3_pgentry_t ol3e = *pl3e;
> +    l2_pgentry_t l2e = l2e_from_intpte(l3e_get_intpte(ol3e));
> +    l2_pgentry_t *l2t = alloc_xen_pagetable();
> +
> +    if ( !l2t )
> +        return false;
> +
> +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
> +    {
> +        l2e_write(l2t + i, l2e);
> +        l2e = l2e_from_intpte(
> +                  l2e_get_intpte(l2e) + (PAGE_SIZE << PAGETABLE_ORDER));
> +    }
> +
> +    if ( locking )
> +        spin_lock(&map_pgdir_lock);
> +    if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
> +         (l3e_get_flags(*pl3e) & _PAGE_PSE) )
> +    {
> +        l3e_write_atomic(pl3e,
> +            l3e_from_paddr(virt_to_maddr(l2t), __PAGE_HYPERVISOR));
> +        l2t = NULL;
> +    }
> +    if ( locking )
> +        spin_unlock(&map_pgdir_lock);
> +
> +    if ( virt )
> +    {
> +        unsigned int flush_flags =
> +            FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
> +
> +        if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
> +            flush_flags |= FLUSH_TLB_GLOBAL;

Another problematic use of ol3e which is racy on conflict.  You need to
strictly use the content of *pl3e from within the locked region.

However, why have you moved the flushing in here?  Only one of the two
callers actually wanted it, and even then I'm not totally sure it is
necessary.

Both callers operate on an arbitrary range of addresses, and for
anything other than a singleton update, will want to issue a single
flush at the end, rather than a spate of flushes for sub-areas.

(Although someone really please check my reasoning here for the
map_pages_to_xen() case which currently does have sub-area flushing.)

Either the flush wants dropping (and best via a prereq patch altering
map_pages_to_xen()), or you need to cache ol3e in the locked region with
ACCESS_ONCE() or equivalent.

> +        flush_area(virt, flush_flags);
> +    }
> +
> +    if ( l2t )
> +        free_xen_pagetable(l2t);

Mind annotating this as:

    if ( l2t ) /* Raced on trying to shatter?  Throw away our work. */

to highlight that this is an error path, and there is no connection
between the TLB flushing and pagetable freeing.

~Andrew
Jan Beulich Dec. 13, 2019, 2:36 p.m. UTC | #2
On 13.12.2019 15:19, Andrew Cooper wrote:
> On 12/12/2019 12:46, Hongyan Xia wrote:
>> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
>> index 7d4dd80a85..8def4fb8d8 100644
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -5151,6 +5151,52 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
>>                           flush_area_local((const void *)v, f) : \
>>                           flush_area_all((const void *)v, f))
>>  
>> +/* Shatter an l3 entry and populate l2. If virt is passed in, also do flush. */
>> +static bool shatter_l3e(l3_pgentry_t *pl3e, unsigned long virt, bool locking)
>> +{
>> +    unsigned int i;
>> +    l3_pgentry_t ol3e = *pl3e;
>> +    l2_pgentry_t l2e = l2e_from_intpte(l3e_get_intpte(ol3e));
>> +    l2_pgentry_t *l2t = alloc_xen_pagetable();
>> +
>> +    if ( !l2t )
>> +        return false;
>> +
>> +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
>> +    {
>> +        l2e_write(l2t + i, l2e);
>> +        l2e = l2e_from_intpte(
>> +                  l2e_get_intpte(l2e) + (PAGE_SIZE << PAGETABLE_ORDER));
>> +    }
>> +
>> +    if ( locking )
>> +        spin_lock(&map_pgdir_lock);
>> +    if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
>> +         (l3e_get_flags(*pl3e) & _PAGE_PSE) )
>> +    {
>> +        l3e_write_atomic(pl3e,
>> +            l3e_from_paddr(virt_to_maddr(l2t), __PAGE_HYPERVISOR));
>> +        l2t = NULL;
>> +    }
>> +    if ( locking )
>> +        spin_unlock(&map_pgdir_lock);
>> +
>> +    if ( virt )
>> +    {
>> +        unsigned int flush_flags =
>> +            FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
>> +
>> +        if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
>> +            flush_flags |= FLUSH_TLB_GLOBAL;
> 
> Another problematic use of ol3e which is racy on conflict.  You need to
> strictly use the content of *pl3e from within the locked region.

But this isn't a problem introduced here, i.e. fixing of it doesn't
strictly fall under "re-factor". (I'm certainly not opposed to
getting this right at the same time.)

> However, why have you moved the flushing in here?  Only one of the two
> callers actually wanted it, and even then I'm not totally sure it is
> necessary.
> 
> Both callers operate on an arbitrary range of addresses, and for
> anything other than a singleton update, will want to issue a single
> flush at the end, rather than a spate of flushes for sub-areas.
> 
> (Although someone really please check my reasoning here for the
> map_pages_to_xen() case which currently does have sub-area flushing.)
> 
> Either the flush wants dropping (and best via a prereq patch altering
> map_pages_to_xen()), or you need to cache ol3e in the locked region with
> ACCESS_ONCE() or equivalent.

Well, at best replacing by a single one at the end, but I guess
the current piecemeal behavior is to cope with error paths (see
Julien's report against modify_xen_mappings(), where it's
exactly the other way around). Considering especially speculative
accesses I think it isn't the worst idea to keep the window small
between shatter and flush (short of us doing a proper break-then-
make sequence).

Jan
Andrew Cooper Dec. 13, 2019, 2:58 p.m. UTC | #3
On 13/12/2019 14:36, Jan Beulich wrote:
> On 13.12.2019 15:19, Andrew Cooper wrote:
>> On 12/12/2019 12:46, Hongyan Xia wrote:
>>> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
>>> index 7d4dd80a85..8def4fb8d8 100644
>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -5151,6 +5151,52 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
>>>                           flush_area_local((const void *)v, f) : \
>>>                           flush_area_all((const void *)v, f))
>>>  
>>> +/* Shatter an l3 entry and populate l2. If virt is passed in, also do flush. */
>>> +static bool shatter_l3e(l3_pgentry_t *pl3e, unsigned long virt, bool locking)
>>> +{
>>> +    unsigned int i;
>>> +    l3_pgentry_t ol3e = *pl3e;
>>> +    l2_pgentry_t l2e = l2e_from_intpte(l3e_get_intpte(ol3e));
>>> +    l2_pgentry_t *l2t = alloc_xen_pagetable();
>>> +
>>> +    if ( !l2t )
>>> +        return false;
>>> +
>>> +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
>>> +    {
>>> +        l2e_write(l2t + i, l2e);
>>> +        l2e = l2e_from_intpte(
>>> +                  l2e_get_intpte(l2e) + (PAGE_SIZE << PAGETABLE_ORDER));
>>> +    }
>>> +
>>> +    if ( locking )
>>> +        spin_lock(&map_pgdir_lock);
>>> +    if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
>>> +         (l3e_get_flags(*pl3e) & _PAGE_PSE) )
>>> +    {
>>> +        l3e_write_atomic(pl3e,
>>> +            l3e_from_paddr(virt_to_maddr(l2t), __PAGE_HYPERVISOR));
>>> +        l2t = NULL;
>>> +    }
>>> +    if ( locking )
>>> +        spin_unlock(&map_pgdir_lock);
>>> +
>>> +    if ( virt )
>>> +    {
>>> +        unsigned int flush_flags =
>>> +            FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
>>> +
>>> +        if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
>>> +            flush_flags |= FLUSH_TLB_GLOBAL;
>> Another problematic use of ol3e which is racy on conflict.  You need to
>> strictly use the content of *pl3e from within the locked region.
> But this isn't a problem introduced here, i.e. fixing of it doesn't
> strictly fall under "re-factor". (I'm certainly not opposed to
> getting this right at the same time.)

It is brand new code which is racy.  Its either not necessary, or an
XSA-in-waiting.  (And not necessary, AFAICT).

>
>> However, why have you moved the flushing in here?  Only one of the two
>> callers actually wanted it, and even then I'm not totally sure it is
>> necessary.
>>
>> Both callers operate on an arbitrary range of addresses, and for
>> anything other than a singleton update, will want to issue a single
>> flush at the end, rather than a spate of flushes for sub-areas.
>>
>> (Although someone really please check my reasoning here for the
>> map_pages_to_xen() case which currently does have sub-area flushing.)
>>
>> Either the flush wants dropping (and best via a prereq patch altering
>> map_pages_to_xen()), or you need to cache ol3e in the locked region with
>> ACCESS_ONCE() or equivalent.
> Well, at best replacing by a single one at the end, but I guess
> the current piecemeal behavior is to cope with error paths (see
> Julien's report against modify_xen_mappings(), where it's
> exactly the other way around). Considering especially speculative
> accesses I think it isn't the worst idea to keep the window small
> between shatter and flush (short of us doing a proper break-then-
> make sequence).

Every sub-flush is a broadcast IPI, which is a scalability concern, and
at least needs considering.

x86 is designed not to need BBM, although the BBM sequence can be
helpful at times to simplify other reasoning.  It might actually be
necessary in some SMP cases.

Speculation can bite you at any point, including the very next
instruction.  The logic is either correct, or not correct, and the
distance between the PTE update and the flush is only relevant when it
comes to the scarcity of the incorrect case manifesting in a noticeable way.

Fundamentally, we either need BBM, or the flush is safe to defer to the
end.  Everything in-between is racy, and dropping the sub-flushes would
make any incorrect cases more likely to manifest.

~Andrew
Xia, Hongyan Dec. 13, 2019, 3:55 p.m. UTC | #4
Hi,

On Fri, 2019-12-13 at 14:19 +0000, Andrew Cooper wrote:
> On 12/12/2019 12:46, Hongyan Xia wrote:
> > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> > index 7d4dd80a85..8def4fb8d8 100644
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -5151,6 +5151,52 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long
> > v)
> >                           flush_area_local((const void *)v, f) : \
> >                           flush_area_all((const void *)v, f))
> >  
> > +/* Shatter an l3 entry and populate l2. If virt is passed in, also
> > do flush. */
> > +static bool shatter_l3e(l3_pgentry_t *pl3e, unsigned long virt,
> > bool locking)
> > +{
> > +    unsigned int i;
> > +    l3_pgentry_t ol3e = *pl3e;
> > +    l2_pgentry_t l2e = l2e_from_intpte(l3e_get_intpte(ol3e));
> > +    l2_pgentry_t *l2t = alloc_xen_pagetable();
> > +
> > +    if ( !l2t )
> > +        return false;
> > +
> > +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
> > +    {
> > +        l2e_write(l2t + i, l2e);
> > +        l2e = l2e_from_intpte(
> > +                  l2e_get_intpte(l2e) + (PAGE_SIZE <<
> > PAGETABLE_ORDER));
> > +    }
> > +
> > +    if ( locking )
> > +        spin_lock(&map_pgdir_lock);
> > +    if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
> > +         (l3e_get_flags(*pl3e) & _PAGE_PSE) )
> > +    {
> > +        l3e_write_atomic(pl3e,
> > +            l3e_from_paddr(virt_to_maddr(l2t),
> > __PAGE_HYPERVISOR));
> > +        l2t = NULL;
> > +    }
> > +    if ( locking )
> > +        spin_unlock(&map_pgdir_lock);
> > +
> > +    if ( virt )
> > +    {
> > +        unsigned int flush_flags =
> > +            FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
> > +
> > +        if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
> > +            flush_flags |= FLUSH_TLB_GLOBAL;
> 
> Another problematic use of ol3e which is racy on conflict.  You need
> to
> strictly use the content of *pl3e from within the locked region.
> 

This is actually just refactoring, although if the original code is
wrong, it is also worth fixing.

In fact, in the last couple of days, the more I read the code in
map_pages_to_xen, the more I am worried about its race conditions and
correctness. The lock is mostly used for writes, so there are so many
reads outside the locked region which could potentially read stale
values. One example I found is (after refactoring):

             else if ( l2e_get_flags(*pl2e) & _PAGE_PSE )
             {
	         ...
                 /* Pass virt to indicate we need to flush. */
                 if ( !shatter_l2e(pl2e, virt, locking) )
                     return -ENOMEM;
	     }

	     pl1e  = l2e_to_l1e(*pl2e) + l1_table_offset(virt);

It tries to shatter an l2 page before accessing a pl1e, but is there
any guard between the shattering and the read of pl1e? If another call
comes in between the two and merges this page back to a superpage, the
pl1e then accesses the superpage memory instead of a PTE page! (Please
check my logic.) Also in other places, we see the races between PTE
modifications and flushes.

There could be more examples like this. Of course, removing the code
for merging can avoid a lot of the problems, although Julien explained
to me that it could be useful during boot. If removing is not an
option, is it a big problem to extend the lock, e.g., to the whole
function? It is mostly just used by vmap after boot, and a larger lock
can simplify this function and its logic significantly. vmap is already
taking other global locks before map_pages_to_xen anyway though.

Hongyan
Xia, Hongyan Dec. 13, 2019, 4:02 p.m. UTC | #5
On Fri, 2019-12-13 at 14:58 +0000, Andrew Cooper wrote:
> On 13/12/2019 14:36, Jan Beulich wrote:
> > On 13.12.2019 15:19, Andrew Cooper wrote:
> > > On 12/12/2019 12:46, Hongyan Xia wrote:
> > > > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> > > > index 7d4dd80a85..8def4fb8d8 100644
> > > > --- a/xen/arch/x86/mm.c
> > > > +++ b/xen/arch/x86/mm.c
> > > > @@ -5151,6 +5151,52 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned
> > > > long v)
> > > >                           flush_area_local((const void *)v, f)
> > > > : \
> > > >                           flush_area_all((const void *)v, f))
> > > >  
> > > > +/* Shatter an l3 entry and populate l2. If virt is passed in,
> > > > also do flush. */
> > > > +static bool shatter_l3e(l3_pgentry_t *pl3e, unsigned long
> > > > virt, bool locking)
> > > > +{
> > > > +    unsigned int i;
> > > > +    l3_pgentry_t ol3e = *pl3e;
> > > > +    l2_pgentry_t l2e = l2e_from_intpte(l3e_get_intpte(ol3e));
> > > > +    l2_pgentry_t *l2t = alloc_xen_pagetable();
> > > > +
> > > > +    if ( !l2t )
> > > > +        return false;
> > > > +
> > > > +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
> > > > +    {
> > > > +        l2e_write(l2t + i, l2e);
> > > > +        l2e = l2e_from_intpte(
> > > > +                  l2e_get_intpte(l2e) + (PAGE_SIZE <<
> > > > PAGETABLE_ORDER));
> > > > +    }
> > > > +
> > > > +    if ( locking )
> > > > +        spin_lock(&map_pgdir_lock);
> > > > +    if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
> > > > +         (l3e_get_flags(*pl3e) & _PAGE_PSE) )
> > > > +    {
> > > > +        l3e_write_atomic(pl3e,
> > > > +            l3e_from_paddr(virt_to_maddr(l2t),
> > > > __PAGE_HYPERVISOR));
> > > > +        l2t = NULL;
> > > > +    }
> > > > +    if ( locking )
> > > > +        spin_unlock(&map_pgdir_lock);
> > > > +
> > > > +    if ( virt )
> > > > +    {
> > > > +        unsigned int flush_flags =
> > > > +            FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
> > > > +
> > > > +        if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
> > > > +            flush_flags |= FLUSH_TLB_GLOBAL;
> > > 
> > > Another problematic use of ol3e which is racy on conflict.  You
> > > need to
> > > strictly use the content of *pl3e from within the locked region.
> > 
> > But this isn't a problem introduced here, i.e. fixing of it doesn't
> > strictly fall under "re-factor". (I'm certainly not opposed to
> > getting this right at the same time.)
> 
> It is brand new code which is racy.  Its either not necessary, or an
> XSA-in-waiting.  (And not necessary, AFAICT).
> 

I am really confused. The original code already does sub-region
flushes, and it uses a flush flag from ol3e that is even more outdated
than the refactored version, so I am not quite getting your point. I
hope I am not missing something obvious.

Hongyan
diff mbox series

Patch

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 7d4dd80a85..8def4fb8d8 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5151,6 +5151,52 @@  l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
                          flush_area_local((const void *)v, f) : \
                          flush_area_all((const void *)v, f))
 
+/* Shatter an l3 entry and populate l2. If virt is passed in, also do flush. */
+static bool shatter_l3e(l3_pgentry_t *pl3e, unsigned long virt, bool locking)
+{
+    unsigned int i;
+    l3_pgentry_t ol3e = *pl3e;
+    l2_pgentry_t l2e = l2e_from_intpte(l3e_get_intpte(ol3e));
+    l2_pgentry_t *l2t = alloc_xen_pagetable();
+
+    if ( !l2t )
+        return false;
+
+    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
+    {
+        l2e_write(l2t + i, l2e);
+        l2e = l2e_from_intpte(
+                  l2e_get_intpte(l2e) + (PAGE_SIZE << PAGETABLE_ORDER));
+    }
+
+    if ( locking )
+        spin_lock(&map_pgdir_lock);
+    if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
+         (l3e_get_flags(*pl3e) & _PAGE_PSE) )
+    {
+        l3e_write_atomic(pl3e,
+            l3e_from_paddr(virt_to_maddr(l2t), __PAGE_HYPERVISOR));
+        l2t = NULL;
+    }
+    if ( locking )
+        spin_unlock(&map_pgdir_lock);
+
+    if ( virt )
+    {
+        unsigned int flush_flags =
+            FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
+
+        if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
+            flush_flags |= FLUSH_TLB_GLOBAL;
+        flush_area(virt, flush_flags);
+    }
+
+    if ( l2t )
+        free_xen_pagetable(l2t);
+
+    return true;
+}
+
 int map_pages_to_xen(
     unsigned long virt,
     mfn_t mfn,
@@ -5244,9 +5290,6 @@  int map_pages_to_xen(
         if ( (l3e_get_flags(ol3e) & _PAGE_PRESENT) &&
              (l3e_get_flags(ol3e) & _PAGE_PSE) )
         {
-            unsigned int flush_flags =
-                FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
-
             /* Skip this PTE if there is no change. */
             if ( ((l3e_get_pfn(ol3e) & ~(L2_PAGETABLE_ENTRIES *
                                          L1_PAGETABLE_ENTRIES - 1)) +
@@ -5267,33 +5310,9 @@  int map_pages_to_xen(
                 continue;
             }
 
-            pl2e = alloc_xen_pagetable();
-            if ( pl2e == NULL )
+            /* Pass virt to indicate we need to flush. */
+            if ( !shatter_l3e(pl3e, virt, locking) )
                 return -ENOMEM;
-
-            for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
-                l2e_write(pl2e + i,
-                          l2e_from_pfn(l3e_get_pfn(ol3e) +
-                                       (i << PAGETABLE_ORDER),
-                                       l3e_get_flags(ol3e)));
-
-            if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
-                flush_flags |= FLUSH_TLB_GLOBAL;
-
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
-            if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
-                 (l3e_get_flags(*pl3e) & _PAGE_PSE) )
-            {
-                l3e_write_atomic(pl3e, l3e_from_mfn(virt_to_mfn(pl2e),
-                                                    __PAGE_HYPERVISOR));
-                pl2e = NULL;
-            }
-            if ( locking )
-                spin_unlock(&map_pgdir_lock);
-            flush_area(virt, flush_flags);
-            if ( pl2e )
-                free_xen_pagetable(pl2e);
         }
 
         pl2e = virt_to_xen_l2e(virt);
@@ -5578,27 +5597,8 @@  int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
             }
 
             /* PAGE1GB: shatter the superpage and fall through. */
-            pl2e = alloc_xen_pagetable();
-            if ( !pl2e )
+            if ( !shatter_l3e(pl3e, 0, locking) )
                 return -ENOMEM;
-            for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
-                l2e_write(pl2e + i,
-                          l2e_from_pfn(l3e_get_pfn(*pl3e) +
-                                       (i << PAGETABLE_ORDER),
-                                       l3e_get_flags(*pl3e)));
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
-            if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
-                 (l3e_get_flags(*pl3e) & _PAGE_PSE) )
-            {
-                l3e_write_atomic(pl3e, l3e_from_mfn(virt_to_mfn(pl2e),
-                                                    __PAGE_HYPERVISOR));
-                pl2e = NULL;
-            }
-            if ( locking )
-                spin_unlock(&map_pgdir_lock);
-            if ( pl2e )
-                free_xen_pagetable(pl2e);
         }
 
         /*