diff mbox

[v3,17/24] x86/pv: Avoid raising faults behind the emulators back

Message ID 1480513841-7565-18-git-send-email-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Andrew Cooper Nov. 30, 2016, 1:50 p.m. UTC
Use x86_emul_pagefault() rather than pv_inject_page_fault() to cause raised
pagefaults to be known to the emulator.  This requires altering the callers of
x86_emulate() to properly re-inject the event.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Tim Deegan <tim@xen.org>

v3:
 * Split out #DB handling to an earlier part of the series
 * Don't raise #GP faults for unexpected events, but do return back to the
   guest.
v2:
 * New
---
 xen/arch/x86/mm.c | 104 ++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 65 insertions(+), 39 deletions(-)

Comments

Tim Deegan Dec. 1, 2016, 11:50 a.m. UTC | #1
At 13:50 +0000 on 30 Nov (1480513834), Andrew Cooper wrote:
> Use x86_emul_pagefault() rather than pv_inject_page_fault() to cause raised
> pagefaults to be known to the emulator.  This requires altering the callers of
> x86_emulate() to properly re-inject the event.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Tim Deegan <tim@xen.org>
Jan Beulich Dec. 1, 2016, 12:57 p.m. UTC | #2
>>> On 30.11.16 at 14:50, <andrew.cooper3@citrix.com> wrote:
> @@ -5379,30 +5380,41 @@ int ptwr_do_page_fault(struct vcpu *v, unsigned long addr,
>      page_unlock(page);
>      put_page(page);
>  
> -    /*
> -     * The previous lack of inject_{sw,hw}*() hooks caused exceptions raised
> -     * by the emulator itself to become X86EMUL_UNHANDLEABLE.  Such exceptions
> -     * now set event_pending instead.  Exceptions raised behind the back of
> -     * the emulator don't yet set event_pending.
> -     *
> -     * For now, cause such cases to return to the X86EMUL_UNHANDLEABLE path,
> -     * for no functional change from before.  Future patches will fix this
> -     * properly.
> -     */
> -    if ( rc == X86EMUL_EXCEPTION && ptwr_ctxt.ctxt.event_pending )
> -        rc = X86EMUL_UNHANDLEABLE;
> +    /* More strict than x86_emulate_wrapper(), as this is now true for PV. */
> +    ASSERT(ptwr_ctxt.ctxt.event_pending == (rc == X86EMUL_EXCEPTION));
>  
> -    if ( rc == X86EMUL_UNHANDLEABLE )
> -        goto bail;
> +    switch ( rc )
> +    {
> +    case X86EMUL_EXCEPTION:
> +        /*
> +         * This emulation only covers writes to pagetables which marked

It looks to me as if either the "which" wants to be dropped, or "are" /
"have been" be inserted after it.

> +         * read-only by Xen.  We tolerate #PF (from hitting an adjacent page).

Why "adjacent"? Aiui the only legitimate #PF here can be from a
page having got paged out by the guest by the time here, and that
could be either the page table page itself, or the page(s) holding
the instruction (which normally aren't adjacent at all).

> +         * Anything else is an emulation bug, or a guest playing with the
> +         * instruction stream under Xen's feet.
> +         */
> +        if ( ptwr_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
> +             ptwr_ctxt.ctxt.event.vector == TRAP_page_fault )
> +            pv_inject_event(&ptwr_ctxt.ctxt.event);
> +        else
> +            gdprintk(XENLOG_WARNING,
> +                     "Unexpected event (type %u, vector %#x) from emulation\n",
> +                     ptwr_ctxt.ctxt.event.type, ptwr_ctxt.ctxt.event.vector);
> +
> +        /* Fallthrough */
> +    case X86EMUL_OKAY:
>  
> -    if ( ptwr_ctxt.ctxt.retire.singlestep )
> -        pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
> +        if ( ptwr_ctxt.ctxt.retire.singlestep )
> +            pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
>  
> -    perfc_incr(ptwr_emulations);
> -    return EXCRET_fault_fixed;
> +        /* Fallthrough */
> +    case X86EMUL_RETRY:
> +        perfc_incr(ptwr_emulations);
> +        return EXCRET_fault_fixed;
>  
>   bail:
> -    return 0;
> +    default:
> +        return 0;
> +    }
>  }

I think omitting the default label would not only make the patch
slightly smaller, but also result in the bail label look less misplaced.

With at least the comment aspect above taken care of,
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan
Andrew Cooper Dec. 1, 2016, 1:12 p.m. UTC | #3
On 01/12/16 12:57, Jan Beulich wrote:
>>>> On 30.11.16 at 14:50, <andrew.cooper3@citrix.com> wrote:
>> @@ -5379,30 +5380,41 @@ int ptwr_do_page_fault(struct vcpu *v, unsigned long addr,
>>      page_unlock(page);
>>      put_page(page);
>>  
>> -    /*
>> -     * The previous lack of inject_{sw,hw}*() hooks caused exceptions raised
>> -     * by the emulator itself to become X86EMUL_UNHANDLEABLE.  Such exceptions
>> -     * now set event_pending instead.  Exceptions raised behind the back of
>> -     * the emulator don't yet set event_pending.
>> -     *
>> -     * For now, cause such cases to return to the X86EMUL_UNHANDLEABLE path,
>> -     * for no functional change from before.  Future patches will fix this
>> -     * properly.
>> -     */
>> -    if ( rc == X86EMUL_EXCEPTION && ptwr_ctxt.ctxt.event_pending )
>> -        rc = X86EMUL_UNHANDLEABLE;
>> +    /* More strict than x86_emulate_wrapper(), as this is now true for PV. */
>> +    ASSERT(ptwr_ctxt.ctxt.event_pending == (rc == X86EMUL_EXCEPTION));
>>  
>> -    if ( rc == X86EMUL_UNHANDLEABLE )
>> -        goto bail;
>> +    switch ( rc )
>> +    {
>> +    case X86EMUL_EXCEPTION:
>> +        /*
>> +         * This emulation only covers writes to pagetables which marked
> It looks to me as if either the "which" wants to be dropped, or "are" /
> "have been" be inserted after it.

I meant "which are".

>
>> +         * read-only by Xen.  We tolerate #PF (from hitting an adjacent page).
> Why "adjacent"? Aiui the only legitimate #PF here can be from a
> page having got paged out by the guest by the time here, and that
> could be either the page table page itself, or the page(s) holding
> the instruction (which normally aren't adjacent at all).

This is less clear-cut than the subsequent case, as non-aligned accesses
are disallowed, so simply misaligning a write across the page boundary
won't result in the emulator raising #PF.

One issue I have decided to defer is the behaviour of UNHANDLEABLE in
this case.  If the #PF we are servicing is caused by a misaligned write
to a l1e, instead of explicitly re-injecting #PF, we let the logic
continue searching for all other cases which may have caused a #PF.

It would be better to explicitly disallow the access by re-raising #PF
than returning UNHANDLEABLE, as it skips the rest of the pagefault handler.

>
>> +         * Anything else is an emulation bug, or a guest playing with the
>> +         * instruction stream under Xen's feet.
>> +         */
>> +        if ( ptwr_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
>> +             ptwr_ctxt.ctxt.event.vector == TRAP_page_fault )
>> +            pv_inject_event(&ptwr_ctxt.ctxt.event);
>> +        else
>> +            gdprintk(XENLOG_WARNING,
>> +                     "Unexpected event (type %u, vector %#x) from emulation\n",
>> +                     ptwr_ctxt.ctxt.event.type, ptwr_ctxt.ctxt.event.vector);
>> +
>> +        /* Fallthrough */
>> +    case X86EMUL_OKAY:
>>  
>> -    if ( ptwr_ctxt.ctxt.retire.singlestep )
>> -        pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
>> +        if ( ptwr_ctxt.ctxt.retire.singlestep )
>> +            pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
>>  
>> -    perfc_incr(ptwr_emulations);
>> -    return EXCRET_fault_fixed;
>> +        /* Fallthrough */
>> +    case X86EMUL_RETRY:
>> +        perfc_incr(ptwr_emulations);
>> +        return EXCRET_fault_fixed;
>>  
>>   bail:
>> -    return 0;
>> +    default:
>> +        return 0;
>> +    }
>>  }
> I think omitting the default label would not only make the patch
> slightly smaller, but also result in the bail label look less misplaced.

The default label is needed to cover the UNHANDLEABLE case.

~Andrew

>
> With at least the comment aspect above taken care of,
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>
> Jan
>
Jan Beulich Dec. 1, 2016, 1:27 p.m. UTC | #4
>>> On 01.12.16 at 14:12, <andrew.cooper3@citrix.com> wrote:
> On 01/12/16 12:57, Jan Beulich wrote:
>>>>> On 30.11.16 at 14:50, <andrew.cooper3@citrix.com> wrote:
>>> @@ -5379,30 +5380,41 @@ int ptwr_do_page_fault(struct vcpu *v, unsigned long addr,
>>>      page_unlock(page);
>>>      put_page(page);
>>>  
>>> -    /*
>>> -     * The previous lack of inject_{sw,hw}*() hooks caused exceptions raised
>>> -     * by the emulator itself to become X86EMUL_UNHANDLEABLE.  Such exceptions
>>> -     * now set event_pending instead.  Exceptions raised behind the back of
>>> -     * the emulator don't yet set event_pending.
>>> -     *
>>> -     * For now, cause such cases to return to the X86EMUL_UNHANDLEABLE path,
>>> -     * for no functional change from before.  Future patches will fix this
>>> -     * properly.
>>> -     */
>>> -    if ( rc == X86EMUL_EXCEPTION && ptwr_ctxt.ctxt.event_pending )
>>> -        rc = X86EMUL_UNHANDLEABLE;
>>> +    /* More strict than x86_emulate_wrapper(), as this is now true for PV. */
>>> +    ASSERT(ptwr_ctxt.ctxt.event_pending == (rc == X86EMUL_EXCEPTION));
>>>  
>>> -    if ( rc == X86EMUL_UNHANDLEABLE )
>>> -        goto bail;
>>> +    switch ( rc )
>>> +    {
>>> +    case X86EMUL_EXCEPTION:
>>> +        /*
>>> +         * This emulation only covers writes to pagetables which marked
>>> +         * read-only by Xen.  We tolerate #PF (from hitting an adjacent page).
>> Why "adjacent"? Aiui the only legitimate #PF here can be from a
>> page having got paged out by the guest by the time here, and that
>> could be either the page table page itself, or the page(s) holding
>> the instruction (which normally aren't adjacent at all).
> 
> This is less clear-cut than the subsequent case, as non-aligned accesses
> are disallowed, so simply misaligning a write across the page boundary
> won't result in the emulator raising #PF.

I don't understand - what does this have to do with possibly getting
#PF from fetching the insn?

> One issue I have decided to defer is the behaviour of UNHANDLEABLE in
> this case.  If the #PF we are servicing is caused by a misaligned write
> to a l1e, instead of explicitly re-injecting #PF, we let the logic
> continue searching for all other cases which may have caused a #PF.
> 
> It would be better to explicitly disallow the access by re-raising #PF
> than returning UNHANDLEABLE, as it skips the rest of the pagefault handler.

At the point of the check we don't know yet whether we're dealing
with a page table, so we need to give other handlers a chance.

>>> +         * Anything else is an emulation bug, or a guest playing with the
>>> +         * instruction stream under Xen's feet.
>>> +         */
>>> +        if ( ptwr_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
>>> +             ptwr_ctxt.ctxt.event.vector == TRAP_page_fault )
>>> +            pv_inject_event(&ptwr_ctxt.ctxt.event);
>>> +        else
>>> +            gdprintk(XENLOG_WARNING,
>>> +                     "Unexpected event (type %u, vector %#x) from emulation\n",
>>> +                     ptwr_ctxt.ctxt.event.type, ptwr_ctxt.ctxt.event.vector);
>>> +
>>> +        /* Fallthrough */
>>> +    case X86EMUL_OKAY:
>>>  
>>> -    if ( ptwr_ctxt.ctxt.retire.singlestep )
>>> -        pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
>>> +        if ( ptwr_ctxt.ctxt.retire.singlestep )
>>> +            pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
>>>  
>>> -    perfc_incr(ptwr_emulations);
>>> -    return EXCRET_fault_fixed;
>>> +        /* Fallthrough */
>>> +    case X86EMUL_RETRY:
>>> +        perfc_incr(ptwr_emulations);
>>> +        return EXCRET_fault_fixed;
>>>  
>>>   bail:
>>> -    return 0;
>>> +    default:
>>> +        return 0;
>>> +    }
>>>  }
>> I think omitting the default label would not only make the patch
>> slightly smaller, but also result in the bail label look less misplaced.
> 
> The default label is needed to cover the UNHANDLEABLE case.

Certainly not - it can just fall out of the switch statement, reaching
the pre-existing "bail" label. I could see your point if rc was of an
enum type ...

Jan
diff mbox

Patch

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 5d59479..cdfa85e 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5136,7 +5136,7 @@  static int ptwr_emulated_read(
     if ( !__addr_ok(addr) ||
          (rc = __copy_from_user(p_data, (void *)addr, bytes)) )
     {
-        pv_inject_page_fault(0, addr + bytes - rc); /* Read fault. */
+        x86_emul_pagefault(0, addr + bytes - rc, ctxt);  /* Read fault. */
         return X86EMUL_EXCEPTION;
     }
 
@@ -5177,8 +5177,9 @@  static int ptwr_emulated_update(
         addr &= ~(sizeof(paddr_t)-1);
         if ( (rc = copy_from_user(&full, (void *)addr, sizeof(paddr_t))) != 0 )
         {
-            pv_inject_page_fault(0, /* Read fault. */
-                                 addr + sizeof(paddr_t) - rc);
+            x86_emul_pagefault(0, /* Read fault. */
+                               addr + sizeof(paddr_t) - rc,
+                               &ptwr_ctxt->ctxt);
             return X86EMUL_EXCEPTION;
         }
         /* Mask out bits provided by caller. */
@@ -5379,30 +5380,41 @@  int ptwr_do_page_fault(struct vcpu *v, unsigned long addr,
     page_unlock(page);
     put_page(page);
 
-    /*
-     * The previous lack of inject_{sw,hw}*() hooks caused exceptions raised
-     * by the emulator itself to become X86EMUL_UNHANDLEABLE.  Such exceptions
-     * now set event_pending instead.  Exceptions raised behind the back of
-     * the emulator don't yet set event_pending.
-     *
-     * For now, cause such cases to return to the X86EMUL_UNHANDLEABLE path,
-     * for no functional change from before.  Future patches will fix this
-     * properly.
-     */
-    if ( rc == X86EMUL_EXCEPTION && ptwr_ctxt.ctxt.event_pending )
-        rc = X86EMUL_UNHANDLEABLE;
+    /* More strict than x86_emulate_wrapper(), as this is now true for PV. */
+    ASSERT(ptwr_ctxt.ctxt.event_pending == (rc == X86EMUL_EXCEPTION));
 
-    if ( rc == X86EMUL_UNHANDLEABLE )
-        goto bail;
+    switch ( rc )
+    {
+    case X86EMUL_EXCEPTION:
+        /*
+         * This emulation only covers writes to pagetables which marked
+         * read-only by Xen.  We tolerate #PF (from hitting an adjacent page).
+         * Anything else is an emulation bug, or a guest playing with the
+         * instruction stream under Xen's feet.
+         */
+        if ( ptwr_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
+             ptwr_ctxt.ctxt.event.vector == TRAP_page_fault )
+            pv_inject_event(&ptwr_ctxt.ctxt.event);
+        else
+            gdprintk(XENLOG_WARNING,
+                     "Unexpected event (type %u, vector %#x) from emulation\n",
+                     ptwr_ctxt.ctxt.event.type, ptwr_ctxt.ctxt.event.vector);
+
+        /* Fallthrough */
+    case X86EMUL_OKAY:
 
-    if ( ptwr_ctxt.ctxt.retire.singlestep )
-        pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
+        if ( ptwr_ctxt.ctxt.retire.singlestep )
+            pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
 
-    perfc_incr(ptwr_emulations);
-    return EXCRET_fault_fixed;
+        /* Fallthrough */
+    case X86EMUL_RETRY:
+        perfc_incr(ptwr_emulations);
+        return EXCRET_fault_fixed;
 
  bail:
-    return 0;
+    default:
+        return 0;
+    }
 }
 
 /*************************
@@ -5519,26 +5531,40 @@  int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr,
     else
         rc = x86_emulate(&ctxt, &mmio_ro_emulate_ops);
 
-    /*
-     * The previous lack of inject_{sw,hw}*() hooks caused exceptions raised
-     * by the emulator itself to become X86EMUL_UNHANDLEABLE.  Such exceptions
-     * now set event_pending instead.  Exceptions raised behind the back of
-     * the emulator don't yet set event_pending.
-     *
-     * For now, cause such cases to return to the X86EMUL_UNHANDLEABLE path,
-     * for no functional change from before.  Future patches will fix this
-     * properly.
-     */
-    if ( rc == X86EMUL_EXCEPTION && ctxt.event_pending )
-        rc = X86EMUL_UNHANDLEABLE;
+    /* More strict than x86_emulate_wrapper(), as this is now true for PV. */
+    ASSERT(ctxt.event_pending == (rc == X86EMUL_EXCEPTION));
 
-    if ( rc == X86EMUL_UNHANDLEABLE )
-        return 0;
+    switch ( rc )
+    {
+    case X86EMUL_EXCEPTION:
+        /*
+         * This emulation only covers writes to MMCFG space or read-only MFNs.
+         * We tolerate #PF (from hitting an adjacent page).  Anything else is
+         * an emulation bug, or a guest playing with the instruction stream
+         * under Xen's feet.
+         */
+        if ( ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
+             ctxt.event.vector == TRAP_page_fault )
+            pv_inject_event(&ctxt.event);
+        else
+            gdprintk(XENLOG_WARNING,
+                     "Unexpected event (type %u, vector %#x) from emulation\n",
+                     ctxt.event.type, ctxt.event.vector);
+
+        /* Fallthrough */
+    case X86EMUL_OKAY:
+
+        if ( ctxt.retire.singlestep )
+            pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
 
-    if ( ctxt.retire.singlestep )
-        pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
+        /* Fallthrough */
+    case X86EMUL_RETRY:
+        perfc_incr(ptwr_emulations);
+        return EXCRET_fault_fixed;
 
-    return EXCRET_fault_fixed;
+    default:
+        return 0;
+    }
 }
 
 void *alloc_xen_pagetable(void)