diff mbox series

[1/2] x86/mm: factor out the code for shattering an l3 PTE

Message ID 86a6fb68f76ab287f5c2d99d1e7954e8b2d1ab05.1575646580.git.hongyxia@amazon.com (mailing list archive)
State Superseded
Headers show
Series Refactor super page shattering | expand

Commit Message

Xia, Hongyan Dec. 6, 2019, 3:53 p.m. UTC
map_pages_to_xen and modify_xen_mappings are performing almost exactly
the same operations when shattering an l3 PTE, the only difference
being whether we want to flush.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
---
 xen/arch/x86/mm.c | 85 ++++++++++++++++++++++-------------------------
 1 file changed, 40 insertions(+), 45 deletions(-)

Comments

Andrew Cooper Dec. 6, 2019, 5:50 p.m. UTC | #1
On 06/12/2019 15:53, Hongyan Xia wrote:
> map_pages_to_xen and modify_xen_mappings are performing almost exactly
> the same operations when shattering an l3 PTE, the only difference
> being whether we want to flush.
>
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

Just for reviewing purposes, how does this relate to your other posted
series.  Is it independent, a prerequisite, or does it depend on that
series?

> ---
>  xen/arch/x86/mm.c | 85 ++++++++++++++++++++++-------------------------
>  1 file changed, 40 insertions(+), 45 deletions(-)
>
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index 7d4dd80a85..42aaaa1083 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -5151,6 +5151,43 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
>                           flush_area_local((const void *)v, f) : \
>                           flush_area_all((const void *)v, f))
>  
> +/* Shatter an l3 entry and populate l2. If virt is passed in, also do flush. */
> +static void shatter_l3e(l3_pgentry_t *pl3e, l2_pgentry_t *l2t,
> +        unsigned long virt, bool locking)
> +{
> +    unsigned int i;
> +    l3_pgentry_t ol3e = *pl3e;
> +
> +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
> +        l2e_write(l2t + i,
> +                  l2e_from_pfn(l3e_get_pfn(ol3e) +
> +                               (i << PAGETABLE_ORDER),
> +                               l3e_get_flags(ol3e)));

The PTE macros are especially poor for generated asm, and in cases like
this, I'd like to improve things.

In particular, I believe the following code has identical behaviour:

l2_pgentry_t nl2e = l2e_from_intpte(l3e_get_intpte(ol3e));

for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++, nl2e.l2 += PAGETABLE_ORDER )
    l2e_write(l2t + i, nl2e);

(although someone please double check my logic) and rather better asm
generation.  (I also expect there to be some discussion on whether using
n2le.l2 directly is something we'd want to start doing.)

> +
> +    if ( locking )
> +        spin_lock(&map_pgdir_lock);
> +    if ( (l3e_get_flags(ol3e) & _PAGE_PRESENT) &&
> +         (l3e_get_flags(ol3e) & _PAGE_PSE) )

There is a subtle difference between the original code, and the
refactored code, and it depends on the memory barrier from spin_lock().

Previously, it was re-read from memory after the lock, whereas now it is
likely the stale version from before map_pgdir was locked.

Either you can go back to the old version and use *pl3e, or
alternatively use:

    if ( locking )
        spin_lock(&map_pgdir_lock);
    ol3e = ACCESS_ONCE(*pl3e);
    if ( ...

to make it clear that a reread from memory is required.

> +    {
> +        l3e_write_atomic(pl3e, l3e_from_mfn(virt_to_mfn(l2t),

This would probably generate better asm by using the maddr variants so
we don't have a double shift.

> +                                            __PAGE_HYPERVISOR));
> +        l2t = NULL;
> +    }
> +    if ( locking )
> +        spin_unlock(&map_pgdir_lock);
> +    if ( virt )
> +    {
> +        unsigned int flush_flags =
> +            FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
> +
> +        if ( (l3e_get_flags(ol3e) & _PAGE_GLOBAL) )
> +                flush_flags |= FLUSH_TLB_GLOBAL;
> +        flush_area(virt, flush_flags);
> +    }
> +    if ( l2t )
> +        free_xen_pagetable(l2t);

This surely needs to NULL out l2t, just so the caller doesn't get any
clever ideas and ends up with a use-after-free?

~Andrew
Xia, Hongyan Dec. 7, 2019, 6:20 p.m. UTC | #2
Hi Andrew,

On Fri, 2019-12-06 at 17:50 +0000, Andrew Cooper wrote:
> On 06/12/2019 15:53, Hongyan Xia wrote:
> > map_pages_to_xen and modify_xen_mappings are performing almost
> > exactly
> > the same operations when shattering an l3 PTE, the only difference
> > being whether we want to flush.
> > 
> > Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
> 
> Just for reviewing purposes, how does this relate to your other
> posted
> series.  Is it independent, a prerequisite, or does it depend on that
> series?

This is independent. Other series I posted will touch a lot of PTE
functions, and as Jan suggested, it would be nice to refactor some of
the long and complicated ones before changing them, which could also
prepare us for 5-level paging in the future. Although if these
refactoring patches get in, I need to rebase the series I posted
before.

> 
> > ---
> >  xen/arch/x86/mm.c | 85 ++++++++++++++++++++++---------------------
> > ----
> >  1 file changed, 40 insertions(+), 45 deletions(-)
> > 
> > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> > index 7d4dd80a85..42aaaa1083 100644
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -5151,6 +5151,43 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long
> > v)
> >                           flush_area_local((const void *)v, f) : \
> >                           flush_area_all((const void *)v, f))
> >  
> > +/* Shatter an l3 entry and populate l2. If virt is passed in, also
> > do flush. */
> > +static void shatter_l3e(l3_pgentry_t *pl3e, l2_pgentry_t *l2t,
> > +        unsigned long virt, bool locking)
> > +{
> > +    unsigned int i;
> > +    l3_pgentry_t ol3e = *pl3e;
> > +
> > +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
> > +        l2e_write(l2t + i,
> > +                  l2e_from_pfn(l3e_get_pfn(ol3e) +
> > +                               (i << PAGETABLE_ORDER),
> > +                               l3e_get_flags(ol3e)));
> 
> The PTE macros are especially poor for generated asm, and in cases
> like
> this, I'd like to improve things.
> 
> In particular, I believe the following code has identical behaviour:
> 
> l2_pgentry_t nl2e = l2e_from_intpte(l3e_get_intpte(ol3e));
> 
> for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++, nl2e.l2 +=
> PAGETABLE_ORDER )
>     l2e_write(l2t + i, nl2e);
> 
> (although someone please double check my logic) and rather better asm
> generation.  (I also expect there to be some discussion on whether
> using
> n2le.l2 directly is something we'd want to start doing.)
> 

I believe it should be nl2e.l2 += 1<<(PAGETABLE_ORDER+PAGE_SHIFT) ?
Although the code rarely touches the field (.l2) directly, so maybe use
the macros (get_intpte -> add -> from_intpte) for consistency? They
should produce the same code if the compiler is not too stupid.

> > +
> > +    if ( locking )
> > +        spin_lock(&map_pgdir_lock);
> > +    if ( (l3e_get_flags(ol3e) & _PAGE_PRESENT) &&
> > +         (l3e_get_flags(ol3e) & _PAGE_PSE) )
> 
> There is a subtle difference between the original code, and the
> refactored code, and it depends on the memory barrier from
> spin_lock().
> 
> Previously, it was re-read from memory after the lock, whereas now it
> is
> likely the stale version from before map_pgdir was locked.
> 
> Either you can go back to the old version and use *pl3e, or
> alternatively use:
> 
>     if ( locking )
>         spin_lock(&map_pgdir_lock);
>     ol3e = ACCESS_ONCE(*pl3e);
>     if ( ...
> 
> to make it clear that a reread from memory is required.
> 

Good point. Thanks.

> > +    {
> > +        l3e_write_atomic(pl3e, l3e_from_mfn(virt_to_mfn(l2t),
> 
> This would probably generate better asm by using the maddr variants
> so
> we don't have a double shift.
> 

Right. I can change that.

> > +                                            __PAGE_HYPERVISOR));
> > +        l2t = NULL;
> > +    }
> > +    if ( locking )
> > +        spin_unlock(&map_pgdir_lock);
> > +    if ( virt )
> > +    {
> > +        unsigned int flush_flags =
> > +            FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
> > +
> > +        if ( (l3e_get_flags(ol3e) & _PAGE_GLOBAL) )
> > +                flush_flags |= FLUSH_TLB_GLOBAL;
> > +        flush_area(virt, flush_flags);
> > +    }
> > +    if ( l2t )
> > +        free_xen_pagetable(l2t);
> 
> This surely needs to NULL out l2t, just so the caller doesn't get any
> clever ideas and ends up with a use-after-free?
> 
> ~Andrew

hmm... if we want to make the nullification visible to the caller we
might need to pass &. I wonder if it makes sense to simply move the
memory allocation of pl2e into shatter_l3e as well, so that the caller
cannot have any ideas.

Thanks,
Hongyan
Julien Grall Dec. 7, 2019, 6:53 p.m. UTC | #3
Hi,

On 07/12/2019 18:20, Xia, Hongyan wrote:
> hmm... if we want to make the nullification visible to the caller we
> might need to pass &. I wonder if it makes sense to simply move the
> memory allocation of pl2e into shatter_l3e as well, so that the caller
> cannot have any ideas.

AFAICT, the allocation is done when shattering the page-tables. So it 
would make sense to move it withing the shatter function. Note that you 
will need to propagate the error if there is any.


Cheers,
Julien Grall Dec. 7, 2019, 7:04 p.m. UTC | #4
Hi Hongyan,

On 06/12/2019 15:53, Hongyan Xia wrote:
> map_pages_to_xen and modify_xen_mappings are performing almost exactly
> the same operations when shattering an l3 PTE, the only difference
> being whether we want to flush.
> 
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
> ---
>   xen/arch/x86/mm.c | 85 ++++++++++++++++++++++-------------------------
>   1 file changed, 40 insertions(+), 45 deletions(-)
> 
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index 7d4dd80a85..42aaaa1083 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -5151,6 +5151,43 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
>                            flush_area_local((const void *)v, f) : \
>                            flush_area_all((const void *)v, f))
>   
> +/* Shatter an l3 entry and populate l2. If virt is passed in, also do flush. */
> +static void shatter_l3e(l3_pgentry_t *pl3e, l2_pgentry_t *l2t,
> +        unsigned long virt, bool locking)
> +{
> +    unsigned int i;
> +    l3_pgentry_t ol3e = *pl3e;
> +
> +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
> +        l2e_write(l2t + i,
> +                  l2e_from_pfn(l3e_get_pfn(ol3e) +
> +                               (i << PAGETABLE_ORDER),
> +                               l3e_get_flags(ol3e)));

I understand this is just a factor out of the current code, but the code 
feels wrong (at least in theory) and wasteful.

You would allocate the L2 table, prepare it and then free it if the L3 
entry has _PAGE_PRESENT or _PAGE_PSE cleared.

Also, in theory, there is nothing preventing the l3 flags to be modified 
behind your back. So you could end up to generate the l2 entries with 
the old l3 flags.

In nutshell, it feels to me that the shattering/allocation should happen 
with the lock taken. This would also allow you to not allocate the l2 
table when you are removing the page.

Cheers,
Andrew Cooper Dec. 7, 2019, 7:37 p.m. UTC | #5
On 07/12/2019 19:04, Julien Grall wrote:
> Hi Hongyan,
>
> On 06/12/2019 15:53, Hongyan Xia wrote:
>> map_pages_to_xen and modify_xen_mappings are performing almost exactly
>> the same operations when shattering an l3 PTE, the only difference
>> being whether we want to flush.
>>
>> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
>> ---
>>   xen/arch/x86/mm.c | 85 ++++++++++++++++++++++-------------------------
>>   1 file changed, 40 insertions(+), 45 deletions(-)
>>
>> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
>> index 7d4dd80a85..42aaaa1083 100644
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -5151,6 +5151,43 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
>>                            flush_area_local((const void *)v, f) : \
>>                            flush_area_all((const void *)v, f))
>>   +/* Shatter an l3 entry and populate l2. If virt is passed in, also
>> do flush. */
>> +static void shatter_l3e(l3_pgentry_t *pl3e, l2_pgentry_t *l2t,
>> +        unsigned long virt, bool locking)
>> +{
>> +    unsigned int i;
>> +    l3_pgentry_t ol3e = *pl3e;
>> +
>> +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
>> +        l2e_write(l2t + i,
>> +                  l2e_from_pfn(l3e_get_pfn(ol3e) +
>> +                               (i << PAGETABLE_ORDER),
>> +                               l3e_get_flags(ol3e)));
>
> I understand this is just a factor out of the current code, but the
> code feels wrong (at least in theory) and wasteful.
>
> You would allocate the L2 table, prepare it and then free it if the L3
> entry has _PAGE_PRESENT or _PAGE_PSE cleared.
>
> Also, in theory, there is nothing preventing the l3 flags to be
> modified behind your back. So you could end up to generate the l2
> entries with the old l3 flags.
>
> In nutshell, it feels to me that the shattering/allocation should
> happen with the lock taken. This would also allow you to not allocate
> the l2 table when you are removing the page.

The existing behaviour is deliberate and most likely wants to remain.

It makes adjustments safe to concurrent modifications, while reducing
the critical section of the lock to only the alteration of the live PTEs.

We don't expect concurrent conflicting updates to these pagetables at
all, but extending the locked region around the memory allocation and
writing the new pagetable is a bottlekneck to parallel updates of
independent addresses.

~Andrew
Julien Grall Dec. 7, 2019, 11:39 p.m. UTC | #6
On 07/12/2019 19:37, Andrew Cooper wrote:
> On 07/12/2019 19:04, Julien Grall wrote:
>> Hi Hongyan,
>>
>> On 06/12/2019 15:53, Hongyan Xia wrote:
>>> map_pages_to_xen and modify_xen_mappings are performing almost exactly
>>> the same operations when shattering an l3 PTE, the only difference
>>> being whether we want to flush.
>>>
>>> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
>>> ---
>>>    xen/arch/x86/mm.c | 85 ++++++++++++++++++++++-------------------------
>>>    1 file changed, 40 insertions(+), 45 deletions(-)
>>>
>>> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
>>> index 7d4dd80a85..42aaaa1083 100644
>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -5151,6 +5151,43 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
>>>                             flush_area_local((const void *)v, f) : \
>>>                             flush_area_all((const void *)v, f))
>>>    +/* Shatter an l3 entry and populate l2. If virt is passed in, also
>>> do flush. */
>>> +static void shatter_l3e(l3_pgentry_t *pl3e, l2_pgentry_t *l2t,
>>> +        unsigned long virt, bool locking)
>>> +{
>>> +    unsigned int i;
>>> +    l3_pgentry_t ol3e = *pl3e;
>>> +
>>> +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
>>> +        l2e_write(l2t + i,
>>> +                  l2e_from_pfn(l3e_get_pfn(ol3e) +
>>> +                               (i << PAGETABLE_ORDER),
>>> +                               l3e_get_flags(ol3e)));
>>
>> I understand this is just a factor out of the current code, but the
>> code feels wrong (at least in theory) and wasteful.
>>
>> You would allocate the L2 table, prepare it and then free it if the L3
>> entry has _PAGE_PRESENT or _PAGE_PSE cleared.
>>
>> Also, in theory, there is nothing preventing the l3 flags to be
>> modified behind your back. So you could end up to generate the l2
>> entries with the old l3 flags.
>>
>> In nutshell, it feels to me that the shattering/allocation should
>> happen with the lock taken. This would also allow you to not allocate
>> the l2 table when you are removing the page.
> 
> The existing behaviour is deliberate and most likely wants to remain.
> 
> It makes adjustments safe to concurrent modifications, while reducing
> the critical section of the lock to only the alteration of the live PTEs.
> 
> We don't expect concurrent conflicting updates to these pagetables at
> all, but extending the locked region around the memory allocation and
> writing the new pagetable is a bottlekneck to parallel updates of
> independent addresses.

I am quite interested to know a bit more about the potential 
bottlenecks. When I rewrote the Xen PT helpers for Arm I didn't see many 
access to the Xen PT.

To make a comparison, I see much more update to the P2M. So I would 
expect such optimization to be more a concern there. Yet we take the 
lock for the full update. Maybe this was an oversight when the P2M was 
created?

Cheers,
Jan Beulich Dec. 9, 2019, 1:22 p.m. UTC | #7
On 07.12.2019 19:20, Xia, Hongyan wrote:
> On Fri, 2019-12-06 at 17:50 +0000, Andrew Cooper wrote:
>> On 06/12/2019 15:53, Hongyan Xia wrote:
>>> +/* Shatter an l3 entry and populate l2. If virt is passed in, also
>>> do flush. */
>>> +static void shatter_l3e(l3_pgentry_t *pl3e, l2_pgentry_t *l2t,
>>> +        unsigned long virt, bool locking)
>>> +{
>>> +    unsigned int i;
>>> +    l3_pgentry_t ol3e = *pl3e;
>>> +
>>> +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
>>> +        l2e_write(l2t + i,
>>> +                  l2e_from_pfn(l3e_get_pfn(ol3e) +
>>> +                               (i << PAGETABLE_ORDER),
>>> +                               l3e_get_flags(ol3e)));
>>
>> The PTE macros are especially poor for generated asm, and in cases
>> like
>> this, I'd like to improve things.
>>
>> In particular, I believe the following code has identical behaviour:
>>
>> l2_pgentry_t nl2e = l2e_from_intpte(l3e_get_intpte(ol3e));
>>
>> for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++, nl2e.l2 +=
>> PAGETABLE_ORDER )
>>     l2e_write(l2t + i, nl2e);
>>
>> (although someone please double check my logic) and rather better asm
>> generation.  (I also expect there to be some discussion on whether
>> using
>> n2le.l2 directly is something we'd want to start doing.)
>>
> 
> I believe it should be nl2e.l2 += 1<<(PAGETABLE_ORDER+PAGE_SHIFT) ?

Indeed.

> Although the code rarely touches the field (.l2) directly, so maybe use
> the macros (get_intpte -> add -> from_intpte) for consistency? They
> should produce the same code if the compiler is not too stupid.

I think the to/from intpte transformations should be used sparingly
too (as being dangerous). How about we make PTEs proper unions, with
a full-field intpte_t as we have it now combined with a set of bit
fields? This would at least eliminate the need for using PAGE_SHIFT
in constructs like the above.

Jan
Xia, Hongyan Dec. 10, 2019, 10:07 a.m. UTC | #8
On Mon, 2019-12-09 at 14:22 +0100, Jan Beulich wrote:
> On 07.12.2019 19:20, Xia, Hongyan wrote:
> > On Fri, 2019-12-06 at 17:50 +0000, Andrew Cooper wrote:
> > > On 06/12/2019 15:53, Hongyan Xia wrote:
> > > > +/* Shatter an l3 entry and populate l2. If virt is passed in,
> > > > also
> > > > do flush. */
> > > > +static void shatter_l3e(l3_pgentry_t *pl3e, l2_pgentry_t *l2t,
> > > > +        unsigned long virt, bool locking)
> > > > +{
> > > > +    unsigned int i;
> > > > +    l3_pgentry_t ol3e = *pl3e;
> > > > +
> > > > +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
> > > > +        l2e_write(l2t + i,
> > > > +                  l2e_from_pfn(l3e_get_pfn(ol3e) +
> > > > +                               (i << PAGETABLE_ORDER),
> > > > +                               l3e_get_flags(ol3e)));
> > > 
> > > The PTE macros are especially poor for generated asm, and in
> > > cases
> > > like
> > > this, I'd like to improve things.
> > > 
> > > In particular, I believe the following code has identical
> > > behaviour:
> > > 
> > > l2_pgentry_t nl2e = l2e_from_intpte(l3e_get_intpte(ol3e));
> > > 
> > > for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++, nl2e.l2 +=
> > > PAGETABLE_ORDER )
> > >     l2e_write(l2t + i, nl2e);
> > > 
> > > (although someone please double check my logic) and rather better
> > > asm
> > > generation.  (I also expect there to be some discussion on
> > > whether
> > > using
> > > n2le.l2 directly is something we'd want to start doing.)
> > > 
> > 
> > I believe it should be nl2e.l2 += 1<<(PAGETABLE_ORDER+PAGE_SHIFT) ?
> 
> Indeed.
> 
> > Although the code rarely touches the field (.l2) directly, so maybe
> > use
> > the macros (get_intpte -> add -> from_intpte) for consistency? They
> > should produce the same code if the compiler is not too stupid.
> 
> I think the to/from intpte transformations should be used sparingly
> too (as being dangerous). How about we make PTEs proper unions, with
> a full-field intpte_t as we have it now combined with a set of bit
> fields? This would at least eliminate the need for using PAGE_SHIFT
> in constructs like the above.

I can see this makes the code look much nicer. One concern I have is
that Andrew's suggestion was to improve the generated assembly code,
and using packed bit fields may generate even more masking and bit
shifting, which in the end might give us more assembly code than before
the refactoring.

Hongyan
diff mbox series

Patch

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 7d4dd80a85..42aaaa1083 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5151,6 +5151,43 @@  l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
                          flush_area_local((const void *)v, f) : \
                          flush_area_all((const void *)v, f))
 
+/* Shatter an l3 entry and populate l2. If virt is passed in, also do flush. */
+static void shatter_l3e(l3_pgentry_t *pl3e, l2_pgentry_t *l2t,
+        unsigned long virt, bool locking)
+{
+    unsigned int i;
+    l3_pgentry_t ol3e = *pl3e;
+
+    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
+        l2e_write(l2t + i,
+                  l2e_from_pfn(l3e_get_pfn(ol3e) +
+                               (i << PAGETABLE_ORDER),
+                               l3e_get_flags(ol3e)));
+
+    if ( locking )
+        spin_lock(&map_pgdir_lock);
+    if ( (l3e_get_flags(ol3e) & _PAGE_PRESENT) &&
+         (l3e_get_flags(ol3e) & _PAGE_PSE) )
+    {
+        l3e_write_atomic(pl3e, l3e_from_mfn(virt_to_mfn(l2t),
+                                            __PAGE_HYPERVISOR));
+        l2t = NULL;
+    }
+    if ( locking )
+        spin_unlock(&map_pgdir_lock);
+    if ( virt )
+    {
+        unsigned int flush_flags =
+            FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
+
+        if ( (l3e_get_flags(ol3e) & _PAGE_GLOBAL) )
+                flush_flags |= FLUSH_TLB_GLOBAL;
+        flush_area(virt, flush_flags);
+    }
+    if ( l2t )
+        free_xen_pagetable(l2t);
+}
+
 int map_pages_to_xen(
     unsigned long virt,
     mfn_t mfn,
@@ -5244,9 +5281,6 @@  int map_pages_to_xen(
         if ( (l3e_get_flags(ol3e) & _PAGE_PRESENT) &&
              (l3e_get_flags(ol3e) & _PAGE_PSE) )
         {
-            unsigned int flush_flags =
-                FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
-
             /* Skip this PTE if there is no change. */
             if ( ((l3e_get_pfn(ol3e) & ~(L2_PAGETABLE_ENTRIES *
                                          L1_PAGETABLE_ENTRIES - 1)) +
@@ -5270,30 +5304,8 @@  int map_pages_to_xen(
             pl2e = alloc_xen_pagetable();
             if ( pl2e == NULL )
                 return -ENOMEM;
-
-            for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
-                l2e_write(pl2e + i,
-                          l2e_from_pfn(l3e_get_pfn(ol3e) +
-                                       (i << PAGETABLE_ORDER),
-                                       l3e_get_flags(ol3e)));
-
-            if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
-                flush_flags |= FLUSH_TLB_GLOBAL;
-
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
-            if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
-                 (l3e_get_flags(*pl3e) & _PAGE_PSE) )
-            {
-                l3e_write_atomic(pl3e, l3e_from_mfn(virt_to_mfn(pl2e),
-                                                    __PAGE_HYPERVISOR));
-                pl2e = NULL;
-            }
-            if ( locking )
-                spin_unlock(&map_pgdir_lock);
-            flush_area(virt, flush_flags);
-            if ( pl2e )
-                free_xen_pagetable(pl2e);
+            /* Pass virt to indicate we need to flush. */
+            shatter_l3e(pl3e, pl2e, virt, locking);
         }
 
         pl2e = virt_to_xen_l2e(virt);
@@ -5581,24 +5593,7 @@  int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
             pl2e = alloc_xen_pagetable();
             if ( !pl2e )
                 return -ENOMEM;
-            for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
-                l2e_write(pl2e + i,
-                          l2e_from_pfn(l3e_get_pfn(*pl3e) +
-                                       (i << PAGETABLE_ORDER),
-                                       l3e_get_flags(*pl3e)));
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
-            if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
-                 (l3e_get_flags(*pl3e) & _PAGE_PSE) )
-            {
-                l3e_write_atomic(pl3e, l3e_from_mfn(virt_to_mfn(pl2e),
-                                                    __PAGE_HYPERVISOR));
-                pl2e = NULL;
-            }
-            if ( locking )
-                spin_unlock(&map_pgdir_lock);
-            if ( pl2e )
-                free_xen_pagetable(pl2e);
+            shatter_l3e(pl3e, pl2e, 0, locking);
         }
 
         /*