diff mbox

[v4,11/31] x86/mm: split out writable pagetable emulation code

Message ID 20170817144456.18989-12-wei.liu2@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wei Liu Aug. 17, 2017, 2:44 p.m. UTC
Move the code to pv/emul-ptwr-op.c. Fix coding style issues while
moving the code.

Rename ptwr_emulated_read to pv_emul_ptwr_read and export it via
pv/mm.h because it is needed by other emulation code.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c              | 308 +-------------------------------------
 xen/arch/x86/pv/Makefile       |   1 +
 xen/arch/x86/pv/emul-ptwr-op.c | 327 +++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/pv/emulate.h      |   2 +
 4 files changed, 332 insertions(+), 306 deletions(-)
 create mode 100644 xen/arch/x86/pv/emul-ptwr-op.c

Comments

Jan Beulich Aug. 24, 2017, 3:15 p.m. UTC | #1
>>> On 17.08.17 at 16:44, <wei.liu2@citrix.com> wrote:
> Move the code to pv/emul-ptwr-op.c. Fix coding style issues while
> moving the code.
> 
> Rename ptwr_emulated_read to pv_emul_ptwr_read and export it via
> pv/mm.h because it is needed by other emulation code.

If other emulated code uses it, renaming the function would better
imply dropping the ptwr infix from it. pv_emulated_read() perhaps?

> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  xen/arch/x86/mm.c              | 308 +-------------------------------------
>  xen/arch/x86/pv/Makefile       |   1 +
>  xen/arch/x86/pv/emul-ptwr-op.c | 327 +++++++++++++++++++++++++++++++++++++++++

Would you mind calling this just ptwr.c?

> +/*************************
> + * Writable Pagetables
> + */
> +
> +struct ptwr_emulate_ctxt {
> +    struct x86_emulate_ctxt ctxt;
> +    unsigned long cr2;
> +    l1_pgentry_t  pte;
> +};
>[...]
> +static int ptwr_emulated_update(unsigned long addr, paddr_t old, paddr_t val,
> +                                unsigned int bytes, unsigned int do_cmpxchg,
> +                                struct ptwr_emulate_ctxt *ptwr_ctxt)

I've meanwhile noticed that in prior patches of yours such movement
was needlessly retaining the component prefixes. With you splitting
things into separate files, these aren't really useful anymore - stack
traces will have them disambiguated by being prefixed with their
file names. They merely eat valuable serial line bandwidth / ring
buffer space and clutter the (serial) log. I could accept the structure
tags to stay the way they are, but please shorten the local function
names as much as possible without losing information. That'll likely
mean dropping more than just the ptwr_ prefix.

Jan
Wei Liu Aug. 30, 2017, 2:07 p.m. UTC | #2
On Thu, Aug 24, 2017 at 09:15:36AM -0600, Jan Beulich wrote:
> >>> On 17.08.17 at 16:44, <wei.liu2@citrix.com> wrote:
> > Move the code to pv/emul-ptwr-op.c. Fix coding style issues while
> > moving the code.
> > 
> > Rename ptwr_emulated_read to pv_emul_ptwr_read and export it via
> > pv/mm.h because it is needed by other emulation code.
> 
> If other emulated code uses it, renaming the function would better
> imply dropping the ptwr infix from it. pv_emulated_read() perhaps?
> 
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > ---
> >  xen/arch/x86/mm.c              | 308 +-------------------------------------
> >  xen/arch/x86/pv/Makefile       |   1 +
> >  xen/arch/x86/pv/emul-ptwr-op.c | 327 +++++++++++++++++++++++++++++++++++++++++
> 
> Would you mind calling this just ptwr.c?

That seems to deviate from the naming scheme we decided on, but I don't
think I care.

> 
> > +/*************************
> > + * Writable Pagetables
> > + */
> > +
> > +struct ptwr_emulate_ctxt {
> > +    struct x86_emulate_ctxt ctxt;
> > +    unsigned long cr2;
> > +    l1_pgentry_t  pte;
> > +};
> >[...]
> > +static int ptwr_emulated_update(unsigned long addr, paddr_t old, paddr_t val,
> > +                                unsigned int bytes, unsigned int do_cmpxchg,
> > +                                struct ptwr_emulate_ctxt *ptwr_ctxt)
> 
> I've meanwhile noticed that in prior patches of yours such movement
> was needlessly retaining the component prefixes. With you splitting
> things into separate files, these aren't really useful anymore - stack
> traces will have them disambiguated by being prefixed with their
> file names. They merely eat valuable serial line bandwidth / ring
> buffer space and clutter the (serial) log. I could accept the structure
> tags to stay the way they are, but please shorten the local function
> names as much as possible without losing information. That'll likely
> mean dropping more than just the ptwr_ prefix.
> 

No problem.

Do you want me to change the ones I already moved? If so, I will do it
before we release 4.10.

> Jan
>
Jan Beulich Aug. 30, 2017, 3:23 p.m. UTC | #3
>>> On 30.08.17 at 16:07, <wei.liu2@citrix.com> wrote:
> On Thu, Aug 24, 2017 at 09:15:36AM -0600, Jan Beulich wrote:
>> >>> On 17.08.17 at 16:44, <wei.liu2@citrix.com> wrote:
>> > +/*************************
>> > + * Writable Pagetables
>> > + */
>> > +
>> > +struct ptwr_emulate_ctxt {
>> > +    struct x86_emulate_ctxt ctxt;
>> > +    unsigned long cr2;
>> > +    l1_pgentry_t  pte;
>> > +};
>> >[...]
>> > +static int ptwr_emulated_update(unsigned long addr, paddr_t old, paddr_t val,
>> > +                                unsigned int bytes, unsigned int do_cmpxchg,
>> > +                                struct ptwr_emulate_ctxt *ptwr_ctxt)
>> 
>> I've meanwhile noticed that in prior patches of yours such movement
>> was needlessly retaining the component prefixes. With you splitting
>> things into separate files, these aren't really useful anymore - stack
>> traces will have them disambiguated by being prefixed with their
>> file names. They merely eat valuable serial line bandwidth / ring
>> buffer space and clutter the (serial) log. I could accept the structure
>> tags to stay the way they are, but please shorten the local function
>> names as much as possible without losing information. That'll likely
>> mean dropping more than just the ptwr_ prefix.
> 
> No problem.
> 
> Do you want me to change the ones I already moved? If so, I will do it
> before we release 4.10.

I'd likely be doing it at some point myself, so if you're willing to
do it, I would of course appreciate it.

Jan
Wei Liu Aug. 30, 2017, 3:43 p.m. UTC | #4
On Wed, Aug 30, 2017 at 09:23:20AM -0600, Jan Beulich wrote:
> >>> On 30.08.17 at 16:07, <wei.liu2@citrix.com> wrote:
> > On Thu, Aug 24, 2017 at 09:15:36AM -0600, Jan Beulich wrote:
> >> >>> On 17.08.17 at 16:44, <wei.liu2@citrix.com> wrote:
> >> > +/*************************
> >> > + * Writable Pagetables
> >> > + */
> >> > +
> >> > +struct ptwr_emulate_ctxt {
> >> > +    struct x86_emulate_ctxt ctxt;
> >> > +    unsigned long cr2;
> >> > +    l1_pgentry_t  pte;
> >> > +};
> >> >[...]
> >> > +static int ptwr_emulated_update(unsigned long addr, paddr_t old, paddr_t val,
> >> > +                                unsigned int bytes, unsigned int do_cmpxchg,
> >> > +                                struct ptwr_emulate_ctxt *ptwr_ctxt)
> >> 
> >> I've meanwhile noticed that in prior patches of yours such movement
> >> was needlessly retaining the component prefixes. With you splitting
> >> things into separate files, these aren't really useful anymore - stack
> >> traces will have them disambiguated by being prefixed with their
> >> file names. They merely eat valuable serial line bandwidth / ring
> >> buffer space and clutter the (serial) log. I could accept the structure
> >> tags to stay the way they are, but please shorten the local function
> >> names as much as possible without losing information. That'll likely
> >> mean dropping more than just the ptwr_ prefix.
> > 
> > No problem.
> > 
> > Do you want me to change the ones I already moved? If so, I will do it
> > before we release 4.10.
> 
> I'd likely be doing it at some point myself, so if you're willing to
> do it, I would of course appreciate it.
> 

Sure, I can do that tomorrow or the day after tomorrow.
diff mbox

Patch

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 4ac69b3804..3c0aa52f38 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4786,310 +4786,6 @@  long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 }
 
 
-/*************************
- * Writable Pagetables
- */
-
-struct ptwr_emulate_ctxt {
-    struct x86_emulate_ctxt ctxt;
-    unsigned long cr2;
-    l1_pgentry_t  pte;
-};
-
-static int ptwr_emulated_read(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_data,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    unsigned int rc = bytes;
-    unsigned long addr = offset;
-
-    if ( !__addr_ok(addr) ||
-         (rc = __copy_from_user(p_data, (void *)addr, bytes)) )
-    {
-        x86_emul_pagefault(0, addr + bytes - rc, ctxt);  /* Read fault. */
-        return X86EMUL_EXCEPTION;
-    }
-
-    return X86EMUL_OKAY;
-}
-
-static int ptwr_emulated_update(
-    unsigned long addr,
-    paddr_t old,
-    paddr_t val,
-    unsigned int bytes,
-    unsigned int do_cmpxchg,
-    struct ptwr_emulate_ctxt *ptwr_ctxt)
-{
-    unsigned long mfn;
-    unsigned long unaligned_addr = addr;
-    struct page_info *page;
-    l1_pgentry_t pte, ol1e, nl1e, *pl1e;
-    struct vcpu *v = current;
-    struct domain *d = v->domain;
-    int ret;
-
-    /* Only allow naturally-aligned stores within the original %cr2 page. */
-    if ( unlikely(((addr ^ ptwr_ctxt->cr2) & PAGE_MASK) ||
-                  (addr & (bytes - 1))) )
-    {
-        gdprintk(XENLOG_WARNING, "bad access (cr2=%lx, addr=%lx, bytes=%u)\n",
-                 ptwr_ctxt->cr2, addr, bytes);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    /* Turn a sub-word access into a full-word access. */
-    if ( bytes != sizeof(paddr_t) )
-    {
-        paddr_t      full;
-        unsigned int rc, offset = addr & (sizeof(paddr_t) - 1);
-
-        /* Align address; read full word. */
-        addr &= ~(sizeof(paddr_t) - 1);
-        if ( (rc = copy_from_user(&full, (void *)addr, sizeof(paddr_t))) != 0 )
-        {
-            x86_emul_pagefault(0, /* Read fault. */
-                               addr + sizeof(paddr_t) - rc,
-                               &ptwr_ctxt->ctxt);
-            return X86EMUL_EXCEPTION;
-        }
-        /* Mask out bits provided by caller. */
-        full &= ~((((paddr_t)1 << (bytes * 8)) - 1) << (offset * 8));
-        /* Shift the caller value and OR in the missing bits. */
-        val  &= (((paddr_t)1 << (bytes * 8)) - 1);
-        val <<= (offset) * 8;
-        val  |= full;
-        /* Also fill in missing parts of the cmpxchg old value. */
-        old  &= (((paddr_t)1 << (bytes * 8)) - 1);
-        old <<= (offset) * 8;
-        old  |= full;
-    }
-
-    pte  = ptwr_ctxt->pte;
-    mfn  = l1e_get_pfn(pte);
-    page = mfn_to_page(mfn);
-
-    /* We are looking only for read-only mappings of p.t. pages. */
-    ASSERT((l1e_get_flags(pte) & (_PAGE_RW|_PAGE_PRESENT)) == _PAGE_PRESENT);
-    ASSERT(mfn_valid(_mfn(mfn)));
-    ASSERT((page->u.inuse.type_info & PGT_type_mask) == PGT_l1_page_table);
-    ASSERT((page->u.inuse.type_info & PGT_count_mask) != 0);
-    ASSERT(page_get_owner(page) == d);
-
-    /* Check the new PTE. */
-    nl1e = l1e_from_intpte(val);
-    switch ( ret = get_page_from_l1e(nl1e, d, d) )
-    {
-    default:
-        if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) &&
-             !do_cmpxchg && (l1e_get_flags(nl1e) & _PAGE_PRESENT) )
-        {
-            /*
-             * If this is an upper-half write to a PAE PTE then we assume that
-             * the guest has simply got the two writes the wrong way round. We
-             * zap the PRESENT bit on the assumption that the bottom half will
-             * be written immediately after we return to the guest.
-             */
-            gdprintk(XENLOG_DEBUG, "ptwr_emulate: fixing up invalid PAE PTE %"
-                     PRIpte"\n", l1e_get_intpte(nl1e));
-            l1e_remove_flags(nl1e, _PAGE_PRESENT);
-        }
-        else
-        {
-            gdprintk(XENLOG_WARNING, "could not get_page_from_l1e()\n");
-            return X86EMUL_UNHANDLEABLE;
-        }
-        break;
-    case 0:
-        break;
-    case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
-        ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
-        l1e_flip_flags(nl1e, ret);
-        break;
-    }
-
-    adjust_guest_l1e(nl1e, d);
-
-    /* Checked successfully: do the update (write or cmpxchg). */
-    pl1e = map_domain_page(_mfn(mfn));
-    pl1e = (l1_pgentry_t *)((unsigned long)pl1e + (addr & ~PAGE_MASK));
-    if ( do_cmpxchg )
-    {
-        bool okay;
-        intpte_t t = old;
-
-        ol1e = l1e_from_intpte(old);
-        okay = paging_cmpxchg_guest_entry(v, &l1e_get_intpte(*pl1e),
-                                          &t, l1e_get_intpte(nl1e), _mfn(mfn));
-        okay = (okay && t == old);
-
-        if ( !okay )
-        {
-            unmap_domain_page(pl1e);
-            put_page_from_l1e(nl1e, d);
-            return X86EMUL_RETRY;
-        }
-    }
-    else
-    {
-        ol1e = *pl1e;
-        if ( !UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn, v, 0) )
-            BUG();
-    }
-
-    trace_ptwr_emulation(addr, nl1e);
-
-    unmap_domain_page(pl1e);
-
-    /* Finally, drop the old PTE. */
-    put_page_from_l1e(ol1e, d);
-
-    return X86EMUL_OKAY;
-}
-
-static int ptwr_emulated_write(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_data,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    paddr_t val = 0;
-
-    if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes - 1)) || !bytes )
-    {
-        gdprintk(XENLOG_WARNING, "bad write size (addr=%lx, bytes=%u)\n",
-                 offset, bytes);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    memcpy(&val, p_data, bytes);
-
-    return ptwr_emulated_update(
-        offset, 0, val, bytes, 0,
-        container_of(ctxt, struct ptwr_emulate_ctxt, ctxt));
-}
-
-static int ptwr_emulated_cmpxchg(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_old,
-    void *p_new,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    paddr_t old = 0, new = 0;
-
-    if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes - 1)) )
-    {
-        gdprintk(XENLOG_WARNING, "bad cmpxchg size (addr=%lx, bytes=%u)\n",
-                 offset, bytes);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    memcpy(&old, p_old, bytes);
-    memcpy(&new, p_new, bytes);
-
-    return ptwr_emulated_update(
-        offset, old, new, bytes, 1,
-        container_of(ctxt, struct ptwr_emulate_ctxt, ctxt));
-}
-
-static const struct x86_emulate_ops ptwr_emulate_ops = {
-    .read       = ptwr_emulated_read,
-    .insn_fetch = ptwr_emulated_read,
-    .write      = ptwr_emulated_write,
-    .cmpxchg    = ptwr_emulated_cmpxchg,
-    .validate   = pv_emul_is_mem_write,
-    .cpuid      = pv_emul_cpuid,
-};
-
-/* Write page fault handler: check if guest is trying to modify a PTE. */
-int ptwr_do_page_fault(struct vcpu *v, unsigned long addr,
-                       struct cpu_user_regs *regs)
-{
-    struct domain *d = v->domain;
-    struct page_info *page;
-    l1_pgentry_t      pte;
-    struct ptwr_emulate_ctxt ptwr_ctxt = {
-        .ctxt = {
-            .regs = regs,
-            .vendor = d->arch.cpuid->x86_vendor,
-            .addr_size = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
-            .sp_size   = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
-            .lma       = !is_pv_32bit_domain(d),
-        },
-    };
-    int rc;
-
-    /* Attempt to read the PTE that maps the VA being accessed. */
-    pv_get_guest_eff_l1e(addr, &pte);
-
-    /* We are looking only for read-only mappings of p.t. pages. */
-    if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) ||
-         rangeset_contains_singleton(mmio_ro_ranges, l1e_get_pfn(pte)) ||
-         !get_page_from_mfn(_mfn(l1e_get_pfn(pte)), d) )
-        goto bail;
-
-    page = l1e_get_page(pte);
-    if ( !page_lock(page) )
-    {
-        put_page(page);
-        goto bail;
-    }
-
-    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-    {
-        page_unlock(page);
-        put_page(page);
-        goto bail;
-    }
-
-    ptwr_ctxt.cr2 = addr;
-    ptwr_ctxt.pte = pte;
-
-    rc = x86_emulate(&ptwr_ctxt.ctxt, &ptwr_emulate_ops);
-
-    page_unlock(page);
-    put_page(page);
-
-    switch ( rc )
-    {
-    case X86EMUL_EXCEPTION:
-        /*
-         * This emulation only covers writes to pagetables which are marked
-         * read-only by Xen.  We tolerate #PF (in case a concurrent pagetable
-         * update has succeeded on a different vcpu).  Anything else is an
-         * emulation bug, or a guest playing with the instruction stream under
-         * Xen's feet.
-         */
-        if ( ptwr_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
-             ptwr_ctxt.ctxt.event.vector == TRAP_page_fault )
-            pv_inject_event(&ptwr_ctxt.ctxt.event);
-        else
-            gdprintk(XENLOG_WARNING,
-                     "Unexpected event (type %u, vector %#x) from emulation\n",
-                     ptwr_ctxt.ctxt.event.type, ptwr_ctxt.ctxt.event.vector);
-
-        /* Fallthrough */
-    case X86EMUL_OKAY:
-
-        if ( ptwr_ctxt.ctxt.retire.singlestep )
-            pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
-
-        /* Fallthrough */
-    case X86EMUL_RETRY:
-        perfc_incr(ptwr_emulations);
-        return EXCRET_fault_fixed;
-    }
-
- bail:
-    return 0;
-}
-
 /*************************
  * fault handling for read-only MMIO pages
  */
@@ -5117,7 +4813,7 @@  int mmio_ro_emulated_write(
 
 static const struct x86_emulate_ops mmio_ro_emulate_ops = {
     .read       = x86emul_unhandleable_rw,
-    .insn_fetch = ptwr_emulated_read,
+    .insn_fetch = pv_emul_ptwr_read,
     .write      = mmio_ro_emulated_write,
     .validate   = pv_emul_is_mem_write,
     .cpuid      = pv_emul_cpuid,
@@ -5156,7 +4852,7 @@  int mmcfg_intercept_write(
 
 static const struct x86_emulate_ops mmcfg_intercept_ops = {
     .read       = x86emul_unhandleable_rw,
-    .insn_fetch = ptwr_emulated_read,
+    .insn_fetch = pv_emul_ptwr_read,
     .write      = mmcfg_intercept_write,
     .validate   = pv_emul_is_mem_write,
     .cpuid      = pv_emul_cpuid,
diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile
index c83aed493b..cbd890c5f2 100644
--- a/xen/arch/x86/pv/Makefile
+++ b/xen/arch/x86/pv/Makefile
@@ -4,6 +4,7 @@  obj-y += emulate.o
 obj-y += emul-gate-op.o
 obj-y += emul-inv-op.o
 obj-y += emul-priv-op.o
+obj-y += emul-ptwr-op.o
 obj-y += hypercall.o
 obj-y += iret.o
 obj-y += misc-hypercalls.o
diff --git a/xen/arch/x86/pv/emul-ptwr-op.c b/xen/arch/x86/pv/emul-ptwr-op.c
new file mode 100644
index 0000000000..17d7ee8f41
--- /dev/null
+++ b/xen/arch/x86/pv/emul-ptwr-op.c
@@ -0,0 +1,327 @@ 
+/******************************************************************************
+ * arch/x86/pv/emul-ptwr-op.c
+ *
+ * Emulate writable pagetable for PV guests
+ *
+ * Copyright (c) 2002-2005 K A Fraser
+ * Copyright (c) 2004 Christian Limpach
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/guest_access.h>
+#include <xen/trace.h>
+
+#include <asm/pv/mm.h>
+
+#include "emulate.h"
+
+/*************************
+ * Writable Pagetables
+ */
+
+struct ptwr_emulate_ctxt {
+    struct x86_emulate_ctxt ctxt;
+    unsigned long cr2;
+    l1_pgentry_t  pte;
+};
+
+int pv_emul_ptwr_read(enum x86_segment seg, unsigned long offset, void *p_data,
+                      unsigned int bytes, struct x86_emulate_ctxt *ctxt)
+{
+    unsigned int rc = bytes;
+    unsigned long addr = offset;
+
+    if ( !__addr_ok(addr) ||
+         (rc = __copy_from_user(p_data, (void *)addr, bytes)) )
+    {
+        x86_emul_pagefault(0, addr + bytes - rc, ctxt);  /* Read fault. */
+        return X86EMUL_EXCEPTION;
+    }
+
+    return X86EMUL_OKAY;
+}
+
+static int ptwr_emulated_update(unsigned long addr, paddr_t old, paddr_t val,
+                                unsigned int bytes, unsigned int do_cmpxchg,
+                                struct ptwr_emulate_ctxt *ptwr_ctxt)
+{
+    unsigned long mfn;
+    unsigned long unaligned_addr = addr;
+    struct page_info *page;
+    l1_pgentry_t pte, ol1e, nl1e, *pl1e;
+    struct vcpu *v = current;
+    struct domain *d = v->domain;
+    int ret;
+
+    /* Only allow naturally-aligned stores within the original %cr2 page. */
+    if ( unlikely(((addr ^ ptwr_ctxt->cr2) & PAGE_MASK) ||
+                  (addr & (bytes - 1))) )
+    {
+        gdprintk(XENLOG_WARNING, "bad access (cr2=%lx, addr=%lx, bytes=%u)\n",
+                 ptwr_ctxt->cr2, addr, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    /* Turn a sub-word access into a full-word access. */
+    if ( bytes != sizeof(paddr_t) )
+    {
+        paddr_t      full;
+        unsigned int rc, offset = addr & (sizeof(paddr_t) - 1);
+
+        /* Align address; read full word. */
+        addr &= ~(sizeof(paddr_t) - 1);
+        if ( (rc = copy_from_user(&full, (void *)addr, sizeof(paddr_t))) != 0 )
+        {
+            x86_emul_pagefault(0, /* Read fault. */
+                               addr + sizeof(paddr_t) - rc,
+                               &ptwr_ctxt->ctxt);
+            return X86EMUL_EXCEPTION;
+        }
+        /* Mask out bits provided by caller. */
+        full &= ~((((paddr_t)1 << (bytes * 8)) - 1) << (offset * 8));
+        /* Shift the caller value and OR in the missing bits. */
+        val  &= (((paddr_t)1 << (bytes * 8)) - 1);
+        val <<= (offset) * 8;
+        val  |= full;
+        /* Also fill in missing parts of the cmpxchg old value. */
+        old  &= (((paddr_t)1 << (bytes * 8)) - 1);
+        old <<= (offset) * 8;
+        old  |= full;
+    }
+
+    pte  = ptwr_ctxt->pte;
+    mfn  = l1e_get_pfn(pte);
+    page = mfn_to_page(mfn);
+
+    /* We are looking only for read-only mappings of p.t. pages. */
+    ASSERT((l1e_get_flags(pte) & (_PAGE_RW|_PAGE_PRESENT)) == _PAGE_PRESENT);
+    ASSERT(mfn_valid(_mfn(mfn)));
+    ASSERT((page->u.inuse.type_info & PGT_type_mask) == PGT_l1_page_table);
+    ASSERT((page->u.inuse.type_info & PGT_count_mask) != 0);
+    ASSERT(page_get_owner(page) == d);
+
+    /* Check the new PTE. */
+    nl1e = l1e_from_intpte(val);
+    switch ( ret = get_page_from_l1e(nl1e, d, d) )
+    {
+    default:
+        if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) &&
+             !do_cmpxchg && (l1e_get_flags(nl1e) & _PAGE_PRESENT) )
+        {
+            /*
+             * If this is an upper-half write to a PAE PTE then we assume that
+             * the guest has simply got the two writes the wrong way round. We
+             * zap the PRESENT bit on the assumption that the bottom half will
+             * be written immediately after we return to the guest.
+             */
+            gdprintk(XENLOG_DEBUG, "ptwr_emulate: fixing up invalid PAE PTE %"
+                     PRIpte"\n", l1e_get_intpte(nl1e));
+            l1e_remove_flags(nl1e, _PAGE_PRESENT);
+        }
+        else
+        {
+            gdprintk(XENLOG_WARNING, "could not get_page_from_l1e()\n");
+            return X86EMUL_UNHANDLEABLE;
+        }
+        break;
+    case 0:
+        break;
+    case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+        ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+        l1e_flip_flags(nl1e, ret);
+        break;
+    }
+
+    adjust_guest_l1e(nl1e, d);
+
+    /* Checked successfully: do the update (write or cmpxchg). */
+    pl1e = map_domain_page(_mfn(mfn));
+    pl1e = (l1_pgentry_t *)((unsigned long)pl1e + (addr & ~PAGE_MASK));
+    if ( do_cmpxchg )
+    {
+        bool okay;
+        intpte_t t = old;
+
+        ol1e = l1e_from_intpte(old);
+        okay = paging_cmpxchg_guest_entry(v, &l1e_get_intpte(*pl1e),
+                                          &t, l1e_get_intpte(nl1e), _mfn(mfn));
+        okay = (okay && t == old);
+
+        if ( !okay )
+        {
+            unmap_domain_page(pl1e);
+            put_page_from_l1e(nl1e, d);
+            return X86EMUL_RETRY;
+        }
+    }
+    else
+    {
+        ol1e = *pl1e;
+        if ( !UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn, v, 0) )
+            BUG();
+    }
+
+    trace_ptwr_emulation(addr, nl1e);
+
+    unmap_domain_page(pl1e);
+
+    /* Finally, drop the old PTE. */
+    put_page_from_l1e(ol1e, d);
+
+    return X86EMUL_OKAY;
+}
+
+static int ptwr_emulated_write(enum x86_segment seg, unsigned long offset,
+                               void *p_data, unsigned int bytes,
+                               struct x86_emulate_ctxt *ctxt)
+{
+    paddr_t val = 0;
+
+    if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes - 1)) || !bytes )
+    {
+        gdprintk(XENLOG_WARNING, "bad write size (addr=%lx, bytes=%u)\n",
+                 offset, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    memcpy(&val, p_data, bytes);
+
+    return ptwr_emulated_update(
+        offset, 0, val, bytes, 0,
+        container_of(ctxt, struct ptwr_emulate_ctxt, ctxt));
+}
+
+static int ptwr_emulated_cmpxchg(enum x86_segment seg, unsigned long offset,
+                                 void *p_old, void *p_new, unsigned int bytes,
+                                 struct x86_emulate_ctxt *ctxt)
+{
+    paddr_t old = 0, new = 0;
+
+    if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes - 1)) )
+    {
+        gdprintk(XENLOG_WARNING, "bad cmpxchg size (addr=%lx, bytes=%u)\n",
+                 offset, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    memcpy(&old, p_old, bytes);
+    memcpy(&new, p_new, bytes);
+
+    return ptwr_emulated_update(
+        offset, old, new, bytes, 1,
+        container_of(ctxt, struct ptwr_emulate_ctxt, ctxt));
+}
+
+static const struct x86_emulate_ops ptwr_emulate_ops = {
+    .read       = pv_emul_ptwr_read,
+    .insn_fetch = pv_emul_ptwr_read,
+    .write      = ptwr_emulated_write,
+    .cmpxchg    = ptwr_emulated_cmpxchg,
+    .validate   = pv_emul_is_mem_write,
+    .cpuid      = pv_emul_cpuid,
+};
+
+/* Write page fault handler: check if guest is trying to modify a PTE. */
+int ptwr_do_page_fault(struct vcpu *v, unsigned long addr,
+                       struct cpu_user_regs *regs)
+{
+    struct domain *d = v->domain;
+    struct page_info *page;
+    l1_pgentry_t      pte;
+    struct ptwr_emulate_ctxt ptwr_ctxt = {
+        .ctxt = {
+            .regs = regs,
+            .vendor = d->arch.cpuid->x86_vendor,
+            .addr_size = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
+            .sp_size   = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
+            .lma       = !is_pv_32bit_domain(d),
+        },
+    };
+    int rc;
+
+    /* Attempt to read the PTE that maps the VA being accessed. */
+    pv_get_guest_eff_l1e(addr, &pte);
+
+    /* We are looking only for read-only mappings of p.t. pages. */
+    if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) ||
+         rangeset_contains_singleton(mmio_ro_ranges, l1e_get_pfn(pte)) ||
+         !get_page_from_mfn(_mfn(l1e_get_pfn(pte)), d) )
+        goto bail;
+
+    page = l1e_get_page(pte);
+    if ( !page_lock(page) )
+    {
+        put_page(page);
+        goto bail;
+    }
+
+    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(page);
+        put_page(page);
+        goto bail;
+    }
+
+    ptwr_ctxt.cr2 = addr;
+    ptwr_ctxt.pte = pte;
+
+    rc = x86_emulate(&ptwr_ctxt.ctxt, &ptwr_emulate_ops);
+
+    page_unlock(page);
+    put_page(page);
+
+    switch ( rc )
+    {
+    case X86EMUL_EXCEPTION:
+        /*
+         * This emulation only covers writes to pagetables which are marked
+         * read-only by Xen.  We tolerate #PF (in case a concurrent pagetable
+         * update has succeeded on a different vcpu).  Anything else is an
+         * emulation bug, or a guest playing with the instruction stream under
+         * Xen's feet.
+         */
+        if ( ptwr_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
+             ptwr_ctxt.ctxt.event.vector == TRAP_page_fault )
+            pv_inject_event(&ptwr_ctxt.ctxt.event);
+        else
+            gdprintk(XENLOG_WARNING,
+                     "Unexpected event (type %u, vector %#x) from emulation\n",
+                     ptwr_ctxt.ctxt.event.type, ptwr_ctxt.ctxt.event.vector);
+
+        /* Fallthrough */
+    case X86EMUL_OKAY:
+
+        if ( ptwr_ctxt.ctxt.retire.singlestep )
+            pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
+
+        /* Fallthrough */
+    case X86EMUL_RETRY:
+        perfc_incr(ptwr_emulations);
+        return EXCRET_fault_fixed;
+    }
+
+ bail:
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/pv/emulate.h b/xen/arch/x86/pv/emulate.h
index 89abbe010f..7fb568adc0 100644
--- a/xen/arch/x86/pv/emulate.h
+++ b/xen/arch/x86/pv/emulate.h
@@ -10,4 +10,6 @@  void pv_emul_instruction_done(struct cpu_user_regs *regs, unsigned long rip);
 int pv_emul_is_mem_write(const struct x86_emulate_state *state,
                          struct x86_emulate_ctxt *ctxt);
 
+int pv_emul_ptwr_read(enum x86_segment seg, unsigned long offset, void *p_data,
+                      unsigned int bytes, struct x86_emulate_ctxt *ctxt);
 #endif /* __PV_EMULATE_H__ */