diff mbox

[v2,06/19] x86/pv: Implement pv_inject_{event, page_fault, hw_exception}()

Message ID 1480331616-6165-7-git-send-email-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Andrew Cooper Nov. 28, 2016, 11:13 a.m. UTC
To help with event injection improvements for the PV uses of x86_emulate(),
implement a event injection API which matches its hvm counterpart.

This is started with taking do_guest_trap() and modifying its calling API to
pv_inject_event(), subsequentally implementing the former in terms of the
latter.

The existing propagate_page_fault() is fairly similar to
pv_inject_page_fault(), although it has a return value.  Only a single caller
makes use of the return value, and non-NULL is only returned if the passed cr2
is non-canonical.  Opencode this single case in
handle_gdt_ldt_mapping_fault(), allowing propagate_page_fault() to become
void.

The #PF specific bits are moved into pv_inject_event(), and
pv_inject_page_fault() is implemented as a static inline wrapper.
reserved_bit_page_fault() is pure code motion.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Tim Deegan <tim@xen.org>

v2:
 * New
---
 xen/arch/x86/mm.c               |   5 +-
 xen/arch/x86/mm/shadow/common.c |   4 +-
 xen/arch/x86/traps.c            | 172 ++++++++++++++++++++--------------------
 xen/include/asm-x86/domain.h    |  26 ++++++
 xen/include/asm-x86/mm.h        |   1 -
 5 files changed, 118 insertions(+), 90 deletions(-)

Comments

Tim Deegan Nov. 28, 2016, 11:58 a.m. UTC | #1
At 11:13 +0000 on 28 Nov (1480331603), Andrew Cooper wrote:
> To help with event injection improvements for the PV uses of x86_emulate(),
> implement a event injection API which matches its hvm counterpart.
> 
> This is started with taking do_guest_trap() and modifying its calling API to
> pv_inject_event(), subsequentally implementing the former in terms of the
> latter.
> 
> The existing propagate_page_fault() is fairly similar to
> pv_inject_page_fault(), although it has a return value.  Only a single caller
> makes use of the return value, and non-NULL is only returned if the passed cr2
> is non-canonical.  Opencode this single case in
> handle_gdt_ldt_mapping_fault(), allowing propagate_page_fault() to become
> void.
> 
> The #PF specific bits are moved into pv_inject_event(), and
> pv_inject_page_fault() is implemented as a static inline wrapper.
> reserved_bit_page_fault() is pure code motion.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Tim Deegan <tim@xen.org>

with one note:

> +    if ( vector == TRAP_page_fault )
> +    {
> +        v->arch.pv_vcpu.ctrlreg[2] = event->cr2;
> +        arch_set_cr2(v, event->cr2);
> +
> +        /* Re-set error_code.user flag appropriately for the guest. */
> +        error_code &= ~PFEC_user_mode;
> +        if ( !guest_kernel_mode(v, regs) )
> +            error_code |= PFEC_user_mode;

I can see that you're just moving this code, but isn't it wrong for
what are now called "implicit" accesses?  My Ack stands on this patch
regardless.

Cheers,

Tim.
Andrew Cooper Nov. 28, 2016, 11:59 a.m. UTC | #2
On 28/11/16 11:58, Tim Deegan wrote:
> At 11:13 +0000 on 28 Nov (1480331603), Andrew Cooper wrote:
>> To help with event injection improvements for the PV uses of x86_emulate(),
>> implement a event injection API which matches its hvm counterpart.
>>
>> This is started with taking do_guest_trap() and modifying its calling API to
>> pv_inject_event(), subsequentally implementing the former in terms of the
>> latter.
>>
>> The existing propagate_page_fault() is fairly similar to
>> pv_inject_page_fault(), although it has a return value.  Only a single caller
>> makes use of the return value, and non-NULL is only returned if the passed cr2
>> is non-canonical.  Opencode this single case in
>> handle_gdt_ldt_mapping_fault(), allowing propagate_page_fault() to become
>> void.
>>
>> The #PF specific bits are moved into pv_inject_event(), and
>> pv_inject_page_fault() is implemented as a static inline wrapper.
>> reserved_bit_page_fault() is pure code motion.
>>
>> No functional change.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by: Tim Deegan <tim@xen.org>
>
> with one note:
>
>> +    if ( vector == TRAP_page_fault )
>> +    {
>> +        v->arch.pv_vcpu.ctrlreg[2] = event->cr2;
>> +        arch_set_cr2(v, event->cr2);
>> +
>> +        /* Re-set error_code.user flag appropriately for the guest. */
>> +        error_code &= ~PFEC_user_mode;
>> +        if ( !guest_kernel_mode(v, regs) )
>> +            error_code |= PFEC_user_mode;
> I can see that you're just moving this code, but isn't it wrong for
> what are now called "implicit" accesses?  My Ack stands on this patch
> regardless.

One swamp at a time. :)

I have an equally sized series for that mess.

~Andrew
Jan Beulich Nov. 29, 2016, 4 p.m. UTC | #3
>>> On 28.11.16 at 12:13, <andrew.cooper3@citrix.com> wrote:
> The existing propagate_page_fault() is fairly similar to
> pv_inject_page_fault(), although it has a return value.  Only a single caller
> makes use of the return value, and non-NULL is only returned if the passed cr2
> is non-canonical.  Opencode this single case in
> handle_gdt_ldt_mapping_fault(), allowing propagate_page_fault() to become
> void.

I can only say that back then it was quite intentional to not open
code this in the caller, no matter that it was (and still is) just one.

>      if ( unlikely(null_trap_bounce(v, tb)) )
> -        gprintk(XENLOG_WARNING,
> -                "Unhandled %s fault/trap [#%d, ec=%04x]\n",
> -                trapstr(trapnr), trapnr, regs->error_code);
> +    {
> +        if ( vector == TRAP_page_fault )
> +        {
> +            printk("%pv: unhandled page fault (ec=%04X)\n", v, error_code);
> +            show_page_walk(event->cr2);
> +
> +            if ( unlikely(error_code & PFEC_reserved_bit) )
> +                reserved_bit_page_fault(event->cr2, regs);

I think you want to move the show_page_walk() into an else here,
to avoid logging two of them. But then again - why do you move
this behind the null_trap_bounce() check? It had been logged
independently in the original code, and for (imo) a good reason
(reserved bit faults are always a sign of a hypervisor problem
after all).

> +        }
> +        else
> +            gprintk(XENLOG_WARNING,
> +                    "Unhandled %s fault/trap [#%d, ec=%04x]\n",
> +                    trapstr(vector), vector, error_code);

Which tells us that we need to finally get our log level handling
straightened, to avoid inconsistencies like the one here comparing
with the code a few lines up.

> +static inline void do_guest_trap(unsigned int trapnr,
> +                                 const struct cpu_user_regs *regs)
> +{
> +    const struct x86_event event = {

I don't mind the const, but I don't think it's very useful here.

Jan
Andrew Cooper Nov. 29, 2016, 4:50 p.m. UTC | #4
On 29/11/16 16:00, Jan Beulich wrote:
>>>> On 28.11.16 at 12:13, <andrew.cooper3@citrix.com> wrote:
>> The existing propagate_page_fault() is fairly similar to
>> pv_inject_page_fault(), although it has a return value.  Only a single caller
>> makes use of the return value, and non-NULL is only returned if the passed cr2
>> is non-canonical.  Opencode this single case in
>> handle_gdt_ldt_mapping_fault(), allowing propagate_page_fault() to become
>> void.
> I can only say that back then it was quite intentional to not open
> code this in the caller, no matter that it was (and still is) just one.

Is that an objection to me making this change?

>
>>      if ( unlikely(null_trap_bounce(v, tb)) )
>> -        gprintk(XENLOG_WARNING,
>> -                "Unhandled %s fault/trap [#%d, ec=%04x]\n",
>> -                trapstr(trapnr), trapnr, regs->error_code);
>> +    {
>> +        if ( vector == TRAP_page_fault )
>> +        {
>> +            printk("%pv: unhandled page fault (ec=%04X)\n", v, error_code);
>> +            show_page_walk(event->cr2);
>> +
>> +            if ( unlikely(error_code & PFEC_reserved_bit) )
>> +                reserved_bit_page_fault(event->cr2, regs);
> I think you want to move the show_page_walk() into an else here,
> to avoid logging two of them. But then again - why do you move
> this behind the null_trap_bounce() check? It had been logged
> independently in the original code, and for (imo) a good reason
> (reserved bit faults are always a sign of a hypervisor problem
> after all).

TBH, I found it odd that it was in propagate_page_fault() to start with.

It is the kind of thing which should be in the pagefault handler itself,
not in the reinjection-to-pv-guests code.

Would moving it into fixup_page_fault() be ok?  It should probably go
between the hap/shadow and memadd checks, as shadow guests can have
reserved bits set.

>
>> +        }
>> +        else
>> +            gprintk(XENLOG_WARNING,
>> +                    "Unhandled %s fault/trap [#%d, ec=%04x]\n",
>> +                    trapstr(vector), vector, error_code);
> Which tells us that we need to finally get our log level handling
> straightened, to avoid inconsistencies like the one here comparing
> with the code a few lines up.

Yes.  I chose not to unpick that swamp right now, but if
reserved_bit_page_fault() gets moved out, this bit can be simplified to
this single gprintk().

~Andrew
Jan Beulich Nov. 30, 2016, 8:41 a.m. UTC | #5
>>> On 29.11.16 at 17:50, <andrew.cooper3@citrix.com> wrote:
> On 29/11/16 16:00, Jan Beulich wrote:
>>>>> On 28.11.16 at 12:13, <andrew.cooper3@citrix.com> wrote:
>>> The existing propagate_page_fault() is fairly similar to
>>> pv_inject_page_fault(), although it has a return value.  Only a single caller
>>> makes use of the return value, and non-NULL is only returned if the passed cr2
>>> is non-canonical.  Opencode this single case in
>>> handle_gdt_ldt_mapping_fault(), allowing propagate_page_fault() to become
>>> void.
>> I can only say that back then it was quite intentional to not open
>> code this in the caller, no matter that it was (and still is) just one.
> 
> Is that an objection to me making this change?

Well, no, not really. It's more like "I'd prefer it to stay as is, but I can
see why you want to change it, and not changing it would likely make
your overall modification harder".

>>>      if ( unlikely(null_trap_bounce(v, tb)) )
>>> -        gprintk(XENLOG_WARNING,
>>> -                "Unhandled %s fault/trap [#%d, ec=%04x]\n",
>>> -                trapstr(trapnr), trapnr, regs->error_code);
>>> +    {
>>> +        if ( vector == TRAP_page_fault )
>>> +        {
>>> +            printk("%pv: unhandled page fault (ec=%04X)\n", v, error_code);
>>> +            show_page_walk(event->cr2);
>>> +
>>> +            if ( unlikely(error_code & PFEC_reserved_bit) )
>>> +                reserved_bit_page_fault(event->cr2, regs);
>> I think you want to move the show_page_walk() into an else here,
>> to avoid logging two of them. But then again - why do you move
>> this behind the null_trap_bounce() check? It had been logged
>> independently in the original code, and for (imo) a good reason
>> (reserved bit faults are always a sign of a hypervisor problem
>> after all).
> 
> TBH, I found it odd that it was in propagate_page_fault() to start with.
> 
> It is the kind of thing which should be in the pagefault handler itself,
> not in the reinjection-to-pv-guests code.
> 
> Would moving it into fixup_page_fault() be ok?  It should probably go
> between the hap/shadow and memadd checks, as shadow guests can have
> reserved bits set.

That would move it ahead in the flow quite a bit, which I'm not sure
is a good idea (due to possible [hypothetical] other uses of reserved
bits). Also note that there already is a call to reserved_bit_page_fault()
in the !guest_mode() case, so if anything I would see it moved right
ahead of the pv_inject_page_fault() invocation from do_page_fault().
(This would then shrink page size too, as you wouldn't have to move
around reserved_bit_page_fault() itself.)

Btw, looking at the full patch again I notice that the error code
parameter to pv_inject_page_fault() is signed, which is contrary to
what I think I recall you saying on the HVM side of things (the
error code being non-optional here, and hence better being unsigned,
as X86_EVENT_NO_EC is not allowed).

Jan
Andrew Cooper Nov. 30, 2016, 1:17 p.m. UTC | #6
On 30/11/16 08:41, Jan Beulich wrote:
>>>>      if ( unlikely(null_trap_bounce(v, tb)) )
>>>> -        gprintk(XENLOG_WARNING,
>>>> -                "Unhandled %s fault/trap [#%d, ec=%04x]\n",
>>>> -                trapstr(trapnr), trapnr, regs->error_code);
>>>> +    {
>>>> +        if ( vector == TRAP_page_fault )
>>>> +        {
>>>> +            printk("%pv: unhandled page fault (ec=%04X)\n", v, error_code);
>>>> +            show_page_walk(event->cr2);
>>>> +
>>>> +            if ( unlikely(error_code & PFEC_reserved_bit) )
>>>> +                reserved_bit_page_fault(event->cr2, regs);
>>> I think you want to move the show_page_walk() into an else here,
>>> to avoid logging two of them. But then again - why do you move
>>> this behind the null_trap_bounce() check? It had been logged
>>> independently in the original code, and for (imo) a good reason
>>> (reserved bit faults are always a sign of a hypervisor problem
>>> after all).
>> TBH, I found it odd that it was in propagate_page_fault() to start with.
>>
>> It is the kind of thing which should be in the pagefault handler itself,
>> not in the reinjection-to-pv-guests code.
>>
>> Would moving it into fixup_page_fault() be ok?  It should probably go
>> between the hap/shadow and memadd checks, as shadow guests can have
>> reserved bits set.
> That would move it ahead in the flow quite a bit, which I'm not sure
> is a good idea (due to possible [hypothetical] other uses of reserved
> bits). Also note that there already is a call to reserved_bit_page_fault()
> in the !guest_mode() case, so if anything I would see it moved right
> ahead of the pv_inject_page_fault() invocation from do_page_fault().
> (This would then shrink page size too, as you wouldn't have to move
> around reserved_bit_page_fault() itself.)

Done.  This looks much cleaner.

>
> Btw, looking at the full patch again I notice that the error code
> parameter to pv_inject_page_fault() is signed, which is contrary to
> what I think I recall you saying on the HVM side of things (the
> error code being non-optional here, and hence better being unsigned,
> as X86_EVENT_NO_EC is not allowed).

hvm_inject_page_fault() has always had a signed error code, and the API
is maintained by x86_emul_* and pv_inject_* for consistency.

struct pfinfo currently has an unsigned ec parameter, because all the
existing code uses uint32_t pfec.  In reality, this isn't a problem at
the pfinfo / *_inject_page_fault() boundary.

I have a number of patches focusing on pagefault, and in particular
trying to clean up a lot of misuse of pfec.

~Andrew
diff mbox

Patch

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index d365f59..b7c7122 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5136,7 +5136,7 @@  static int ptwr_emulated_read(
     if ( !__addr_ok(addr) ||
          (rc = __copy_from_user(p_data, (void *)addr, bytes)) )
     {
-        propagate_page_fault(addr + bytes - rc, 0); /* read fault */
+        pv_inject_page_fault(0, addr + bytes - rc); /* Read fault. */
         return X86EMUL_EXCEPTION;
     }
 
@@ -5177,7 +5177,8 @@  static int ptwr_emulated_update(
         addr &= ~(sizeof(paddr_t)-1);
         if ( (rc = copy_from_user(&full, (void *)addr, sizeof(paddr_t))) != 0 )
         {
-            propagate_page_fault(addr+sizeof(paddr_t)-rc, 0); /* read fault */
+            pv_inject_page_fault(0, /* Read fault. */
+                                 addr + sizeof(paddr_t) - rc);
             return X86EMUL_EXCEPTION;
         }
         /* Mask out bits provided by caller. */
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index a4a3c4b..f07803b 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -323,7 +323,7 @@  pv_emulate_read(enum x86_segment seg,
 
     if ( (rc = copy_from_user(p_data, (void *)offset, bytes)) != 0 )
     {
-        propagate_page_fault(offset + bytes - rc, 0); /* read fault */
+        pv_inject_page_fault(0, offset + bytes - rc); /* Read fault. */
         return X86EMUL_EXCEPTION;
     }
 
@@ -1723,7 +1723,7 @@  static mfn_t emulate_gva_to_mfn(struct vcpu *v, unsigned long vaddr,
         if ( is_hvm_vcpu(v) )
             hvm_inject_page_fault(pfec, vaddr);
         else
-            propagate_page_fault(vaddr, pfec);
+            pv_inject_page_fault(pfec, vaddr);
         return _mfn(BAD_GVA_TO_GFN);
     }
 
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index b464211..7301298 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -625,37 +625,90 @@  void fatal_trap(const struct cpu_user_regs *regs, bool_t show_remote)
           (regs->eflags & X86_EFLAGS_IF) ? "" : ", IN INTERRUPT CONTEXT");
 }
 
-static void do_guest_trap(unsigned int trapnr,
-                          const struct cpu_user_regs *regs)
+static void reserved_bit_page_fault(
+    unsigned long addr, struct cpu_user_regs *regs)
+{
+    printk("%pv: reserved bit in page table (ec=%04X)\n",
+           current, regs->error_code);
+    show_page_walk(addr);
+    show_execution_state(regs);
+}
+
+void pv_inject_event(const struct x86_event *event)
 {
     struct vcpu *v = current;
+    struct cpu_user_regs *regs = guest_cpu_user_regs();
     struct trap_bounce *tb;
     const struct trap_info *ti;
+    const uint8_t vector = event->vector;
     const bool use_error_code =
-        ((trapnr < 32) && (TRAP_HAVE_EC & (1u << trapnr)));
+        ((vector < 32) && (TRAP_HAVE_EC & (1u << vector)));
+    unsigned int error_code = event->error_code;
 
-    trace_pv_trap(trapnr, regs->eip, use_error_code, regs->error_code);
+    ASSERT(vector == event->vector); /* Check for no truncation. */
+    if ( use_error_code )
+        ASSERT(error_code != X86_EVENT_NO_EC);
+    else
+        ASSERT(error_code == X86_EVENT_NO_EC);
 
     tb = &v->arch.pv_vcpu.trap_bounce;
-    ti = &v->arch.pv_vcpu.trap_ctxt[trapnr];
+    ti = &v->arch.pv_vcpu.trap_ctxt[vector];
 
     tb->flags = TBF_EXCEPTION;
     tb->cs    = ti->cs;
     tb->eip   = ti->address;
 
+    if ( vector == TRAP_page_fault )
+    {
+        v->arch.pv_vcpu.ctrlreg[2] = event->cr2;
+        arch_set_cr2(v, event->cr2);
+
+        /* Re-set error_code.user flag appropriately for the guest. */
+        error_code &= ~PFEC_user_mode;
+        if ( !guest_kernel_mode(v, regs) )
+            error_code |= PFEC_user_mode;
+
+        trace_pv_page_fault(event->cr2, error_code);
+    }
+    else
+        trace_pv_trap(vector, regs->eip, use_error_code, error_code);
+
     if ( use_error_code )
     {
         tb->flags |= TBF_EXCEPTION_ERRCODE;
-        tb->error_code = regs->error_code;
+        tb->error_code = error_code;
     }
 
     if ( TI_GET_IF(ti) )
         tb->flags |= TBF_INTERRUPT;
 
     if ( unlikely(null_trap_bounce(v, tb)) )
-        gprintk(XENLOG_WARNING,
-                "Unhandled %s fault/trap [#%d, ec=%04x]\n",
-                trapstr(trapnr), trapnr, regs->error_code);
+    {
+        if ( vector == TRAP_page_fault )
+        {
+            printk("%pv: unhandled page fault (ec=%04X)\n", v, error_code);
+            show_page_walk(event->cr2);
+
+            if ( unlikely(error_code & PFEC_reserved_bit) )
+                reserved_bit_page_fault(event->cr2, regs);
+        }
+        else
+            gprintk(XENLOG_WARNING,
+                    "Unhandled %s fault/trap [#%d, ec=%04x]\n",
+                    trapstr(vector), vector, error_code);
+    }
+}
+
+static inline void do_guest_trap(unsigned int trapnr,
+                                 const struct cpu_user_regs *regs)
+{
+    const struct x86_event event = {
+        .vector = trapnr,
+        .error_code = (((trapnr < 32) && (TRAP_HAVE_EC & (1u << trapnr)))
+                       ? regs->error_code : X86_EVENT_NO_EC),
+    };
+
+    pv_inject_event(&event);
 }
 
 static void instruction_done(
@@ -1289,7 +1342,7 @@  static int emulate_invalid_rdtscp(struct cpu_user_regs *regs)
     eip = regs->eip;
     if ( (rc = copy_from_user(opcode, (char *)eip, sizeof(opcode))) != 0 )
     {
-        propagate_page_fault(eip + sizeof(opcode) - rc, 0);
+        pv_inject_page_fault(0, eip + sizeof(opcode) - rc);
         return EXCRET_fault_fixed;
     }
     if ( memcmp(opcode, "\xf\x1\xf9", sizeof(opcode)) )
@@ -1310,7 +1363,7 @@  static int emulate_forced_invalid_op(struct cpu_user_regs *regs)
     /* Check for forced emulation signature: ud2 ; .ascii "xen". */
     if ( (rc = copy_from_user(sig, (char *)eip, sizeof(sig))) != 0 )
     {
-        propagate_page_fault(eip + sizeof(sig) - rc, 0);
+        pv_inject_page_fault(0, eip + sizeof(sig) - rc);
         return EXCRET_fault_fixed;
     }
     if ( memcmp(sig, "\xf\xbxen", sizeof(sig)) )
@@ -1320,7 +1373,7 @@  static int emulate_forced_invalid_op(struct cpu_user_regs *regs)
     /* We only emulate CPUID. */
     if ( ( rc = copy_from_user(instr, (char *)eip, sizeof(instr))) != 0 )
     {
-        propagate_page_fault(eip + sizeof(instr) - rc, 0);
+        pv_inject_page_fault(0, eip + sizeof(instr) - rc);
         return EXCRET_fault_fixed;
     }
     if ( memcmp(instr, "\xf\xa2", sizeof(instr)) )
@@ -1478,62 +1531,6 @@  void do_int3(struct cpu_user_regs *regs)
     do_guest_trap(TRAP_int3, regs);
 }
 
-static void reserved_bit_page_fault(
-    unsigned long addr, struct cpu_user_regs *regs)
-{
-    printk("%pv: reserved bit in page table (ec=%04X)\n",
-           current, regs->error_code);
-    show_page_walk(addr);
-    show_execution_state(regs);
-}
-
-struct trap_bounce *propagate_page_fault(unsigned long addr, u16 error_code)
-{
-    struct trap_info *ti;
-    struct vcpu *v = current;
-    struct trap_bounce *tb = &v->arch.pv_vcpu.trap_bounce;
-
-    if ( unlikely(!is_canonical_address(addr)) )
-    {
-        ti = &v->arch.pv_vcpu.trap_ctxt[TRAP_gp_fault];
-        tb->flags      = TBF_EXCEPTION | TBF_EXCEPTION_ERRCODE;
-        tb->error_code = 0;
-        tb->cs         = ti->cs;
-        tb->eip        = ti->address;
-        if ( TI_GET_IF(ti) )
-            tb->flags |= TBF_INTERRUPT;
-        return tb;
-    }
-
-    v->arch.pv_vcpu.ctrlreg[2] = addr;
-    arch_set_cr2(v, addr);
-
-    /* Re-set error_code.user flag appropriately for the guest. */
-    error_code &= ~PFEC_user_mode;
-    if ( !guest_kernel_mode(v, guest_cpu_user_regs()) )
-        error_code |= PFEC_user_mode;
-
-    trace_pv_page_fault(addr, error_code);
-
-    ti = &v->arch.pv_vcpu.trap_ctxt[TRAP_page_fault];
-    tb->flags = TBF_EXCEPTION | TBF_EXCEPTION_ERRCODE;
-    tb->error_code = error_code;
-    tb->cs         = ti->cs;
-    tb->eip        = ti->address;
-    if ( TI_GET_IF(ti) )
-        tb->flags |= TBF_INTERRUPT;
-    if ( unlikely(null_trap_bounce(v, tb)) )
-    {
-        printk("%pv: unhandled page fault (ec=%04X)\n", v, error_code);
-        show_page_walk(addr);
-    }
-
-    if ( unlikely(error_code & PFEC_reserved_bit) )
-        reserved_bit_page_fault(addr, guest_cpu_user_regs());
-
-    return NULL;
-}
-
 static int handle_gdt_ldt_mapping_fault(
     unsigned long offset, struct cpu_user_regs *regs)
 {
@@ -1565,17 +1562,22 @@  static int handle_gdt_ldt_mapping_fault(
         }
         else
         {
-            struct trap_bounce *tb;
-
             /* In hypervisor mode? Leave it to the #PF handler to fix up. */
             if ( !guest_mode(regs) )
                 return 0;
-            /* In guest mode? Propagate fault to guest, with adjusted %cr2. */
-            tb = propagate_page_fault(curr->arch.pv_vcpu.ldt_base + offset,
-                                      regs->error_code);
-            if ( tb )
-                tb->error_code = (offset & ~(X86_XEC_EXT | X86_XEC_IDT)) |
-                                 X86_XEC_TI;
+
+            /* Access would have become non-canonical? Pass #GP[sel] back. */
+            if ( unlikely(!is_canonical_address(
+                              curr->arch.pv_vcpu.ldt_base + offset)) )
+            {
+                uint16_t ec = (offset & ~(X86_XEC_EXT | X86_XEC_IDT)) | X86_XEC_TI;
+
+                pv_inject_hw_exception(TRAP_gp_fault, ec);
+            }
+            else
+                /* else pass the #PF back, with adjusted %cr2. */
+                pv_inject_page_fault(regs->error_code,
+                                     curr->arch.pv_vcpu.ldt_base + offset);
         }
     }
     else
@@ -1858,7 +1860,7 @@  void do_page_fault(struct cpu_user_regs *regs)
             return;
     }
 
-    propagate_page_fault(addr, regs->error_code);
+    pv_inject_page_fault(regs->error_code, addr);
 }
 
 /*
@@ -2788,7 +2790,7 @@  int pv_emul_cpuid(unsigned int *eax, unsigned int *ebx, unsigned int *ecx,
         goto fail;                                                          \
     if ( (_rc = copy_from_user(&_x, (type *)_ptr, sizeof(_x))) != 0 )       \
     {                                                                       \
-        propagate_page_fault(_ptr + sizeof(_x) - _rc, 0);                   \
+        pv_inject_page_fault(0, _ptr + sizeof(_x) - _rc);                   \
         goto skip;                                                          \
     }                                                                       \
     (eip) += sizeof(_x); _x; })
@@ -2953,8 +2955,8 @@  static int emulate_privileged_op(struct cpu_user_regs *regs)
             if ( (rc = copy_to_user((void *)data_base + rd_ad(edi),
                                     &data, op_bytes)) != 0 )
             {
-                propagate_page_fault(data_base + rd_ad(edi) + op_bytes - rc,
-                                     PFEC_write_access);
+                pv_inject_page_fault(PFEC_write_access,
+                                     data_base + rd_ad(edi) + op_bytes - rc);
                 return EXCRET_fault_fixed;
             }
             wr_ad(edi, regs->edi + (int)((regs->eflags & X86_EFLAGS_DF)
@@ -2971,8 +2973,8 @@  static int emulate_privileged_op(struct cpu_user_regs *regs)
             if ( (rc = copy_from_user(&data, (void *)data_base + rd_ad(esi),
                                       op_bytes)) != 0 )
             {
-                propagate_page_fault(data_base + rd_ad(esi)
-                                     + op_bytes - rc, 0);
+                pv_inject_page_fault(0, data_base + rd_ad(esi)
+                                     + op_bytes - rc);
                 return EXCRET_fault_fixed;
             }
             guest_io_write(port, op_bytes, data, currd);
@@ -3529,8 +3531,8 @@  static void emulate_gate_op(struct cpu_user_regs *regs)
             rc = __put_user(item, stkp); \
             if ( rc ) \
             { \
-                propagate_page_fault((unsigned long)(stkp + 1) - rc, \
-                                     PFEC_write_access); \
+                pv_inject_page_fault(PFEC_write_access, \
+                                     (unsigned long)(stkp + 1) - rc); \
                 return; \
             } \
         } while ( 0 )
@@ -3597,7 +3599,7 @@  static void emulate_gate_op(struct cpu_user_regs *regs)
                     rc = __get_user(parm, ustkp);
                     if ( rc )
                     {
-                        propagate_page_fault((unsigned long)(ustkp + 1) - rc, 0);
+                        pv_inject_page_fault(0, (unsigned long)(ustkp + 1) - rc);
                         return;
                     }
                     push(parm);
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index f6a40eb..39cc658 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -8,6 +8,7 @@ 
 #include <asm/hvm/domain.h>
 #include <asm/e820.h>
 #include <asm/mce.h>
+#include <asm/x86_emulate.h>
 #include <public/vcpu.h>
 #include <public/hvm/hvm_info_table.h>
 
@@ -632,6 +633,31 @@  static inline void free_vcpu_guest_context(struct vcpu_guest_context *vgc)
 struct vcpu_hvm_context;
 int arch_set_info_hvm_guest(struct vcpu *v, const struct vcpu_hvm_context *ctx);
 
+void pv_inject_event(const struct x86_event *event);
+
+static inline void pv_inject_hw_exception(unsigned int vector, int errcode)
+{
+    const struct x86_event event = {
+        .vector = vector,
+        .type = X86_EVENTTYPE_HW_EXCEPTION,
+        .error_code = errcode,
+    };
+
+    pv_inject_event(&event);
+}
+
+static inline void pv_inject_page_fault(int errcode, unsigned long cr2)
+{
+    const struct x86_event event = {
+        .vector = TRAP_page_fault,
+        .type = X86_EVENTTYPE_HW_EXCEPTION,
+        .error_code = errcode,
+        .cr2 = cr2,
+    };
+
+    pv_inject_event(&event);
+}
+
 #endif /* __ASM_DOMAIN_H__ */
 
 /*
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 1b4d1c3..a15029c 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -539,7 +539,6 @@  int new_guest_cr3(unsigned long pfn);
 void make_cr3(struct vcpu *v, unsigned long mfn);
 void update_cr3(struct vcpu *v);
 int vcpu_destroy_pagetables(struct vcpu *);
-struct trap_bounce *propagate_page_fault(unsigned long addr, u16 error_code);
 void *do_page_walk(struct vcpu *v, unsigned long addr);
 
 int __sync_local_execstate(void);