diff mbox series

[v12] x86/emulate: Send vm_event from emulate

Message ID 20190920121636.2573-1-aisaila@bitdefender.com (mailing list archive)
State Superseded
Headers show
Series [v12] x86/emulate: Send vm_event from emulate | expand

Commit Message

Alexandru Stefan ISAILA Sept. 20, 2019, 12:16 p.m. UTC
A/D bit writes (on page walks) can be considered benign by an introspection
agent, so receiving vm_events for them is a pessimization. We try here to
optimize by filtering these events out.
Currently, we are fully emulating the instruction at RIP when the hardware sees
an EPT fault with npfec.kind != npfec_kind_with_gla. This is, however,
incorrect, because the instruction at RIP might legitimately cause an
EPT fault of its own while accessing a _different_ page from the original one,
where A/D were set.
The solution is to perform the whole emulation, while ignoring EPT restrictions
for the walk part, and taking them into account for the "actual" emulating of
the instruction at RIP. When we send out a vm_event, we don't want the emulation
to complete, since in that case we won't be able to veto whatever it is doing.
That would mean that we can't actually prevent any malicious activity, instead
we'd only be able to report on it.
When we see a "send-vm_event" case while emulating, we need to first send the
event out and then suspend the emulation (return X86EMUL_RETRY).
After the emulation stops we'll call hvm_vm_event_do_resume() again after the
introspection agent treats the event and resumes the guest. There, the
instruction at RIP will be fully emulated (with the EPT ignored) if the
introspection application allows it, and the guest will continue to run past
the instruction.

A common example is if the hardware exits because of an EPT fault caused by a
page walk, p2m_mem_access_check() decides if it is going to send a vm_event.
If the vm_event was sent and it would be treated so it runs the instruction
at RIP, that instruction might also hit a protected page and provoke a vm_event.

Now if npfec.kind == npfec_kind_in_gpt and d->arch.monitor.inguest_pagefault_disabled
is true then we are in the page walk case and we can do this emulation optimization
and emulate the page walk while ignoring the EPT, but don't ignore the EPT for the
emulation of the actual instruction.

In the first case we would have 2 EPT events, in the second case we would have
1 EPT event if the instruction at the RIP triggers an EPT event.

We use hvmemul_map_linear_addr() to intercept write access and
__hvm_copy() to intercept exec, read and write access.

In order to have __hvm_copy() issue ~X86EMUL_RETRY a new return type,
HVMTRANS_need_retry, was added and all the places that consume HVMTRANS*
and needed adjustment where changed accordingly.

hvm_emulate_send_vm_event() can return false if there was no violation,
if there was an error from monitor_traps() or p2m_get_mem_access().
-ESRCH from p2m_get_mem_access() is treated as restricted access.

NOTE: hvm_emulate_send_vm_event() assumes the caller will enable/disable
arch.vm_event->send_event

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>

---
Changes since V11:
	- Rename HVMTRANS_bad_gfn_access to HVMTRANS_need_retry
	- Check unlikely(v->arch.vm_event) first
	- Move send_event disable from hvm_monitor_check_p2m() to the
caller
	- Add the missing HVMTRANS_need_retry checks for HVMTRANS*
consumers.
---
 xen/arch/x86/hvm/emulate.c        | 18 ++++++-
 xen/arch/x86/hvm/hvm.c            |  9 ++++
 xen/arch/x86/hvm/intercept.c      |  2 +
 xen/arch/x86/hvm/monitor.c        | 78 +++++++++++++++++++++++++++++++
 xen/arch/x86/mm/mem_access.c      |  9 +++-
 xen/arch/x86/mm/shadow/hvm.c      |  1 +
 xen/include/asm-x86/hvm/monitor.h |  3 ++
 xen/include/asm-x86/hvm/support.h |  1 +
 xen/include/asm-x86/vm_event.h    |  2 +
 9 files changed, 121 insertions(+), 2 deletions(-)

Comments

Jan Beulich Sept. 20, 2019, 2:22 p.m. UTC | #1
On 20.09.2019 14:16, Alexandru Stefan ISAILA wrote:
> In order to have __hvm_copy() issue ~X86EMUL_RETRY a new return type,
> HVMTRANS_need_retry, was added and all the places that consume HVMTRANS*
> and needed adjustment where changed accordingly.

This is wrong and hence confusing: __hvm_copy() would never return
~X86EMUL_RETRY. In fact I think you've confused yourself enough to
make a questionable (possibly resulting) change:

> @@ -582,7 +583,7 @@ static void *hvmemul_map_linear_addr(
>          ASSERT(mfn_x(*mfn) == 0);
>  
>          res = hvm_translate_get_page(curr, addr, true, pfec,
> -                                     &pfinfo, &page, NULL, &p2mt);
> +                                     &pfinfo, &page, &gfn, &p2mt);

This function ...

>          switch ( res )
>          {
> @@ -601,6 +602,7 @@ static void *hvmemul_map_linear_addr(
>  
>          case HVMTRANS_gfn_paged_out:
>          case HVMTRANS_gfn_shared:
> +        case HVMTRANS_need_retry:

... can't return this value, so you should omit this addition,
letting the new return value go through "default:".

> @@ -1852,6 +1864,8 @@ static int hvmemul_rep_movs(
>  
>      xfree(buf);
>  
> +    ASSERT(rc != HVMTRANS_need_retry);
> +
>      if ( rc == HVMTRANS_gfn_paged_out )
>          return X86EMUL_RETRY;
>      if ( rc == HVMTRANS_gfn_shared )
> @@ -1964,6 +1978,8 @@ static int hvmemul_rep_stos(
>          if ( buf != p_data )
>              xfree(buf);
>  
> +        ASSERT(rc != HVMTRANS_need_retry);
> +
>          switch ( rc )
>          {
>          case HVMTRANS_gfn_paged_out:

Looking at this again, I think it would better be an addition to
the switch() (using ASSERT_UNREACHABLE()). Generally this is
true for the rep_movs case as well, but that one would first
need converting to switch(), which I agree is beyond the scope
of this change. In both cases a brief comment would seem
worthwhile adding, clarifying that the new return value can
result from hvm_copy_*_guest_linear() only. This might become
relevant in particular if, down the road, we invent more cases
where HVMTRANS_need_retry is produced.

Then again maybe switching rep_movs to switch() would still be
a good thing to do here: Don't you agree that from an abstract
pov in both cases above X86EMUL_RETRY should be produced, if at
a future point physical accesses could also produce
HVMTRANS_need_retry? With this retaining the assertions is
certainly an option, but I think the fallback return value for
this case should still be X86EMUL_RETRY.

Jan
Alexandru Stefan ISAILA Sept. 20, 2019, 2:59 p.m. UTC | #2
On 20.09.2019 17:22, Jan Beulich wrote:
> On 20.09.2019 14:16, Alexandru Stefan ISAILA wrote:
>> In order to have __hvm_copy() issue ~X86EMUL_RETRY a new return type,
>> HVMTRANS_need_retry, was added and all the places that consume HVMTRANS*
>> and needed adjustment where changed accordingly.
> 
> This is wrong and hence confusing: __hvm_copy() would never return
> ~X86EMUL_RETRY. In fact I think you've confused yourself enough to
> make a questionable (possibly resulting) change:

The idea was to get X86EMUL_RETRY down the line from __hvm_copy().
I will adjust this.

> 
>> @@ -582,7 +583,7 @@ static void *hvmemul_map_linear_addr(
>>           ASSERT(mfn_x(*mfn) == 0);
>>   
>>           res = hvm_translate_get_page(curr, addr, true, pfec,
>> -                                     &pfinfo, &page, NULL, &p2mt);
>> +                                     &pfinfo, &page, &gfn, &p2mt);
> 
> This function ...
> 
>>           switch ( res )
>>           {
>> @@ -601,6 +602,7 @@ static void *hvmemul_map_linear_addr(
>>   
>>           case HVMTRANS_gfn_paged_out:
>>           case HVMTRANS_gfn_shared:
>> +        case HVMTRANS_need_retry:
> 
> ... can't return this value, so you should omit this addition,
> letting the new return value go through "default:".

It is very clear that HVMTRANS_need_retry will not be returned form that 
function. At least for now. I thought you wanted to have every possible 
case covered in the switch. I can remove that case, there is not problem 
here because, like I've said, it will never enter that case.

But as you said later work with HVMTRANS_need_retry will result in 
returning X86EMUL_RETRY. Any way it's your call if I should remove it or 
not.

> 
>> @@ -1852,6 +1864,8 @@ static int hvmemul_rep_movs(
>>   
>>       xfree(buf);
>>   
>> +    ASSERT(rc != HVMTRANS_need_retry);
>> +
>>       if ( rc == HVMTRANS_gfn_paged_out )
>>           return X86EMUL_RETRY;
>>       if ( rc == HVMTRANS_gfn_shared )
>> @@ -1964,6 +1978,8 @@ static int hvmemul_rep_stos(
>>           if ( buf != p_data )
>>               xfree(buf);
>>   
>> +        ASSERT(rc != HVMTRANS_need_retry);
>> +
>>           switch ( rc )
>>           {
>>           case HVMTRANS_gfn_paged_out:
> 
> Looking at this again, I think it would better be an addition to
> the switch() (using ASSERT_UNREACHABLE()). Generally this is
> true for the rep_movs case as well, but that one would first
> need converting to switch(), which I agree is beyond the scope

I agree that this is beyond the scope of this patch but it's not that 
big of a change and it can be done.

But isn't having a default ASSERT_UNREACHABLE(); in both switch cases 
change the behavior of the function?

> of this change. In both cases a brief comment would seem
> worthwhile adding, clarifying that the new return value can
> result from hvm_copy_*_guest_linear() only. This might become
> relevant in particular if, down the road, we invent more cases
> where HVMTRANS_need_retry is produced.

Is this comment aimed for the commit message or another place?

> 
> Then again maybe switching rep_movs to switch() would still be
> a good thing to do here: Don't you agree that from an abstract
> pov in both cases above X86EMUL_RETRY should be produced, if at
> a future point physical accesses could also produce
> HVMTRANS_need_retry? With this retaining the assertions is
> certainly an option, but I think the fallback return value for
> this case should still be X86EMUL_RETRY.
>
Jan Beulich Sept. 20, 2019, 3:20 p.m. UTC | #3
On 20.09.2019 16:59, Alexandru Stefan ISAILA wrote:
> 
> 
> On 20.09.2019 17:22, Jan Beulich wrote:
>> On 20.09.2019 14:16, Alexandru Stefan ISAILA wrote:
>>> In order to have __hvm_copy() issue ~X86EMUL_RETRY a new return type,
>>> HVMTRANS_need_retry, was added and all the places that consume HVMTRANS*
>>> and needed adjustment where changed accordingly.
>>
>> This is wrong and hence confusing: __hvm_copy() would never return
>> ~X86EMUL_RETRY. In fact I think you've confused yourself enough to
>> make a questionable (possibly resulting) change:
> 
> The idea was to get X86EMUL_RETRY down the line from __hvm_copy().
> I will adjust this.
> 
>>
>>> @@ -582,7 +583,7 @@ static void *hvmemul_map_linear_addr(
>>>           ASSERT(mfn_x(*mfn) == 0);
>>>   
>>>           res = hvm_translate_get_page(curr, addr, true, pfec,
>>> -                                     &pfinfo, &page, NULL, &p2mt);
>>> +                                     &pfinfo, &page, &gfn, &p2mt);
>>
>> This function ...
>>
>>>           switch ( res )
>>>           {
>>> @@ -601,6 +602,7 @@ static void *hvmemul_map_linear_addr(
>>>   
>>>           case HVMTRANS_gfn_paged_out:
>>>           case HVMTRANS_gfn_shared:
>>> +        case HVMTRANS_need_retry:
>>
>> ... can't return this value, so you should omit this addition,
>> letting the new return value go through "default:".
> 
> It is very clear that HVMTRANS_need_retry will not be returned form that 
> function. At least for now. I thought you wanted to have every possible 
> case covered in the switch. I can remove that case, there is not problem 
> here because, like I've said, it will never enter that case.
> 
> But as you said later work with HVMTRANS_need_retry will result in 
> returning X86EMUL_RETRY. Any way it's your call if I should remove it or 
> not.

The result should be consistent (i.e. between the case here
and the rep_movs / rep_stos cases below). Overall I think it
would be cleanest if in all three cases an ASSERT_UNREACHABLE()
fell through to a "return X86EMUL_RETRY;".

>>> @@ -1852,6 +1864,8 @@ static int hvmemul_rep_movs(
>>>   
>>>       xfree(buf);
>>>   
>>> +    ASSERT(rc != HVMTRANS_need_retry);
>>> +
>>>       if ( rc == HVMTRANS_gfn_paged_out )
>>>           return X86EMUL_RETRY;
>>>       if ( rc == HVMTRANS_gfn_shared )
>>> @@ -1964,6 +1978,8 @@ static int hvmemul_rep_stos(
>>>           if ( buf != p_data )
>>>               xfree(buf);
>>>   
>>> +        ASSERT(rc != HVMTRANS_need_retry);
>>> +
>>>           switch ( rc )
>>>           {
>>>           case HVMTRANS_gfn_paged_out:
>>
>> Looking at this again, I think it would better be an addition to
>> the switch() (using ASSERT_UNREACHABLE()). Generally this is
>> true for the rep_movs case as well, but that one would first
>> need converting to switch(), which I agree is beyond the scope
> 
> I agree that this is beyond the scope of this patch but it's not that 
> big of a change and it can be done.
> 
> But isn't having a default ASSERT_UNREACHABLE(); in both switch cases 
> change the behavior of the function?

It shouldn't be the default case that gains this assertion,
but the HVMTRANS_need_retry one that is to be added.

>> of this change. In both cases a brief comment would seem
>> worthwhile adding, clarifying that the new return value can
>> result from hvm_copy_*_guest_linear() only. This might become
>> relevant in particular if, down the road, we invent more cases
>> where HVMTRANS_need_retry is produced.
> 
> Is this comment aimed for the commit message or another place?

If you go the ASSERT_UNREACHABLE() route, then the comment(s)
should be code comments next to these assertions. They'd be
there to avoid people having to dig out the reason for why
they're there, to make it easy to decide whether it is safe to
drop them once some new producer of HVMTRANS_need_retry would
appear.

Jan
Alexandru Stefan ISAILA Sept. 23, 2019, 9 a.m. UTC | #4
On 20.09.2019 18:20, Jan Beulich wrote:
> On 20.09.2019 16:59, Alexandru Stefan ISAILA wrote:
>>
>>
>> On 20.09.2019 17:22, Jan Beulich wrote:
>>> On 20.09.2019 14:16, Alexandru Stefan ISAILA wrote:
>>>> In order to have __hvm_copy() issue ~X86EMUL_RETRY a new return type,
>>>> HVMTRANS_need_retry, was added and all the places that consume HVMTRANS*
>>>> and needed adjustment where changed accordingly.
>>>
>>> This is wrong and hence confusing: __hvm_copy() would never return
>>> ~X86EMUL_RETRY. In fact I think you've confused yourself enough to
>>> make a questionable (possibly resulting) change:
>>
>> The idea was to get X86EMUL_RETRY down the line from __hvm_copy().
>> I will adjust this.

This will be changed for:
"A new return type was added, HVMTRANS_need_retry, in order to have all 
the places that consume HVMTRANS* return X86EMUL_RETRY."

>>
>>>
>>>> @@ -582,7 +583,7 @@ static void *hvmemul_map_linear_addr(
>>>>            ASSERT(mfn_x(*mfn) == 0);
>>>>    
>>>>            res = hvm_translate_get_page(curr, addr, true, pfec,
>>>> -                                     &pfinfo, &page, NULL, &p2mt);
>>>> +                                     &pfinfo, &page, &gfn, &p2mt);
>>>
>>> This function ...
>>>
>>>>            switch ( res )
>>>>            {
>>>> @@ -601,6 +602,7 @@ static void *hvmemul_map_linear_addr(
>>>>    
>>>>            case HVMTRANS_gfn_paged_out:
>>>>            case HVMTRANS_gfn_shared:
>>>> +        case HVMTRANS_need_retry:
>>>
>>> ... can't return this value, so you should omit this addition,
>>> letting the new return value go through "default:".
>>
>> It is very clear that HVMTRANS_need_retry will not be returned form that
>> function. At least for now. I thought you wanted to have every possible
>> case covered in the switch. I can remove that case, there is not problem
>> here because, like I've said, it will never enter that case.
>>
>> But as you said later work with HVMTRANS_need_retry will result in
>> returning X86EMUL_RETRY. Any way it's your call if I should remove it or
>> not.
> 
> The result should be consistent (i.e. between the case here
> and the rep_movs / rep_stos cases below). Overall I think it
> would be cleanest if in all three cases an ASSERT_UNREACHABLE()
> fell through to a "return X86EMUL_RETRY;".
> 

Ok, just to make sure this is what is needed and limit the patch 
versions, rep_movs / rep_stos should have a switch like this:

         switch ( rc )
         {
         case HVMTRANS_okay:
             return X86EMUL_OKAY;
         case HVMTRANS_need_retry:
             ASSERT_UNREACHABLE();
             /* fall through */
         case HVMTRANS_gfn_paged_out:
         case HVMTRANS_gfn_shared:
             return X86EMUL_RETRY;
         }

Then hvmemul_map_linear_addr() should have:

         case HVMTRANS_need_retry:
             ASSERT_UNREACHABLE();
             /* fall through */
         case HVMTRANS_gfn_shared:
         case HVMTRANS_gfn_paged_out:
             err = ERR_PTR(~X86EMUL_RETRY);


>>>> @@ -1852,6 +1864,8 @@ static int hvmemul_rep_movs(
>>>>    
>>>>        xfree(buf);
>>>>    
>>>> +    ASSERT(rc != HVMTRANS_need_retry);
>>>> +
>>>>        if ( rc == HVMTRANS_gfn_paged_out )
>>>>            return X86EMUL_RETRY;
>>>>        if ( rc == HVMTRANS_gfn_shared )
>>>> @@ -1964,6 +1978,8 @@ static int hvmemul_rep_stos(
>>>>            if ( buf != p_data )
>>>>                xfree(buf);
>>>>    
>>>> +        ASSERT(rc != HVMTRANS_need_retry);
>>>> +
>>>>            switch ( rc )
>>>>            {
>>>>            case HVMTRANS_gfn_paged_out:
>>>
>>> Looking at this again, I think it would better be an addition to
>>> the switch() (using ASSERT_UNREACHABLE()). Generally this is
>>> true for the rep_movs case as well, but that one would first
>>> need converting to switch(), which I agree is beyond the scope
>>
>> I agree that this is beyond the scope of this patch but it's not that
>> big of a change and it can be done.
>>
>> But isn't having a default ASSERT_UNREACHABLE(); in both switch cases
>> change the behavior of the function?
> 
> It shouldn't be the default case that gains this assertion,
> but the HVMTRANS_need_retry one that is to be added.
> 
>>> of this change. In both cases a brief comment would seem
>>> worthwhile adding, clarifying that the new return value can
>>> result from hvm_copy_*_guest_linear() only. This might become
>>> relevant in particular if, down the road, we invent more cases
>>> where HVMTRANS_need_retry is produced.
>>
>> Is this comment aimed for the commit message or another place?
> 
> If you go the ASSERT_UNREACHABLE() route, then the comment(s)
> should be code comments next to these assertions. They'd be
> there to avoid people having to dig out the reason for why
> they're there, to make it easy to decide whether it is safe to
> drop them once some new producer of HVMTRANS_need_retry would
> appear.
> 

Alex
Jan Beulich Sept. 23, 2019, 9:39 a.m. UTC | #5
On 23.09.2019 11:00, Alexandru Stefan ISAILA wrote:
> Ok, just to make sure this is what is needed and limit the patch 
> versions, rep_movs / rep_stos should have a switch like this:
> 
>          switch ( rc )
>          {
>          case HVMTRANS_okay:
>              return X86EMUL_OKAY;
>          case HVMTRANS_need_retry:
>              ASSERT_UNREACHABLE();
>              /* fall through */
>          case HVMTRANS_gfn_paged_out:
>          case HVMTRANS_gfn_shared:
>              return X86EMUL_RETRY;
>          }
> 
> Then hvmemul_map_linear_addr() should have:
> 
>          case HVMTRANS_need_retry:
>              ASSERT_UNREACHABLE();
>              /* fall through */
>          case HVMTRANS_gfn_shared:
>          case HVMTRANS_gfn_paged_out:
>              err = ERR_PTR(~X86EMUL_RETRY);
> 

Right, plus a brief comment on the assertions as to why they're
there (to clarify under what condition it would be fine to drop
them).

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 36bcb526d3..ee9b97f5b6 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -548,6 +548,7 @@  static void *hvmemul_map_linear_addr(
     unsigned int nr_frames = ((linear + bytes - !!bytes) >> PAGE_SHIFT) -
         (linear >> PAGE_SHIFT) + 1;
     unsigned int i;
+    gfn_t gfn;
 
     /*
      * mfn points to the next free slot.  All used slots have a page reference
@@ -582,7 +583,7 @@  static void *hvmemul_map_linear_addr(
         ASSERT(mfn_x(*mfn) == 0);
 
         res = hvm_translate_get_page(curr, addr, true, pfec,
-                                     &pfinfo, &page, NULL, &p2mt);
+                                     &pfinfo, &page, &gfn, &p2mt);
 
         switch ( res )
         {
@@ -601,6 +602,7 @@  static void *hvmemul_map_linear_addr(
 
         case HVMTRANS_gfn_paged_out:
         case HVMTRANS_gfn_shared:
+        case HVMTRANS_need_retry:
             err = ERR_PTR(~X86EMUL_RETRY);
             goto out;
 
@@ -626,6 +628,14 @@  static void *hvmemul_map_linear_addr(
 
             ASSERT(p2mt == p2m_ram_logdirty || !p2m_is_readonly(p2mt));
         }
+
+        if ( unlikely(curr->arch.vm_event) &&
+             curr->arch.vm_event->send_event &&
+             hvm_monitor_check_p2m(addr, gfn, pfec, npfec_kind_with_gla) )
+        {
+            err = ERR_PTR(~X86EMUL_RETRY);
+            goto out;
+        }
     }
 
     /* Entire access within a single frame? */
@@ -1141,6 +1151,7 @@  static int linear_read(unsigned long addr, unsigned int bytes, void *p_data,
 
     case HVMTRANS_gfn_paged_out:
     case HVMTRANS_gfn_shared:
+    case HVMTRANS_need_retry:
         return X86EMUL_RETRY;
     }
 
@@ -1192,6 +1203,7 @@  static int linear_write(unsigned long addr, unsigned int bytes, void *p_data,
 
     case HVMTRANS_gfn_paged_out:
     case HVMTRANS_gfn_shared:
+    case HVMTRANS_need_retry:
         return X86EMUL_RETRY;
     }
 
@@ -1852,6 +1864,8 @@  static int hvmemul_rep_movs(
 
     xfree(buf);
 
+    ASSERT(rc != HVMTRANS_need_retry);
+
     if ( rc == HVMTRANS_gfn_paged_out )
         return X86EMUL_RETRY;
     if ( rc == HVMTRANS_gfn_shared )
@@ -1964,6 +1978,8 @@  static int hvmemul_rep_stos(
         if ( buf != p_data )
             xfree(buf);
 
+        ASSERT(rc != HVMTRANS_need_retry);
+
         switch ( rc )
         {
         case HVMTRANS_gfn_paged_out:
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index fdb1e17f59..c82e7b2cd3 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3236,6 +3236,15 @@  static enum hvm_translation_result __hvm_copy(
             return HVMTRANS_bad_gfn_to_mfn;
         }
 
+        if ( unlikely(v->arch.vm_event) &&
+             (flags & HVMCOPY_linear) &&
+             v->arch.vm_event->send_event &&
+             hvm_monitor_check_p2m(addr, gfn, pfec, npfec_kind_with_gla) )
+        {
+            put_page(page);
+            return HVMTRANS_need_retry;
+        }
+
         p = (char *)__map_domain_page(page) + (addr & ~PAGE_MASK);
 
         if ( flags & HVMCOPY_to_guest )
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index aac22c595d..90202bdcec 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -145,6 +145,7 @@  int hvm_process_io_intercept(const struct hvm_io_handler *handler,
                 case HVMTRANS_bad_linear_to_gfn:
                 case HVMTRANS_gfn_paged_out:
                 case HVMTRANS_gfn_shared:
+                case HVMTRANS_need_retry:
                     ASSERT_UNREACHABLE();
                     /* fall through */
                 default:
@@ -174,6 +175,7 @@  int hvm_process_io_intercept(const struct hvm_io_handler *handler,
                 case HVMTRANS_bad_linear_to_gfn:
                 case HVMTRANS_gfn_paged_out:
                 case HVMTRANS_gfn_shared:
+                case HVMTRANS_need_retry:
                     ASSERT_UNREACHABLE();
                     /* fall through */
                 default:
diff --git a/xen/arch/x86/hvm/monitor.c b/xen/arch/x86/hvm/monitor.c
index 2a41ccc930..7fb1e2c04e 100644
--- a/xen/arch/x86/hvm/monitor.c
+++ b/xen/arch/x86/hvm/monitor.c
@@ -23,8 +23,10 @@ 
  */
 
 #include <xen/vm_event.h>
+#include <xen/mem_access.h>
 #include <xen/monitor.h>
 #include <asm/hvm/monitor.h>
+#include <asm/altp2m.h>
 #include <asm/monitor.h>
 #include <asm/paging.h>
 #include <asm/vm_event.h>
@@ -215,6 +217,82 @@  void hvm_monitor_interrupt(unsigned int vector, unsigned int type,
     monitor_traps(current, 1, &req);
 }
 
+/*
+ * Send memory access vm_events based on pfec. Returns true if the event was
+ * sent and false for p2m_get_mem_access() error, no violation and event send
+ * error. Assumes the caller will enable/disable arch.vm_event->send_event.
+ */
+bool hvm_monitor_check_p2m(unsigned long gla, gfn_t gfn, uint32_t pfec,
+                           uint16_t kind)
+{
+    xenmem_access_t access;
+    struct vcpu *curr = current;
+    vm_event_request_t req = {};
+    paddr_t gpa = (gfn_to_gaddr(gfn) | (gla & ~PAGE_MASK));
+    int rc;
+
+    ASSERT(curr->arch.vm_event->send_event);
+
+    /*
+     * p2m_get_mem_access() can fail from a invalid MFN and return -ESRCH
+     * in which case access must be restricted.
+     */
+    rc = p2m_get_mem_access(curr->domain, gfn, &access, altp2m_vcpu_idx(curr));
+
+    if ( rc == -ESRCH )
+        access = XENMEM_access_n;
+    else if ( rc )
+        return false;
+
+    switch ( access )
+    {
+    case XENMEM_access_x:
+    case XENMEM_access_rx:
+        if ( pfec & PFEC_write_access )
+            req.u.mem_access.flags = MEM_ACCESS_R | MEM_ACCESS_W;
+        break;
+
+    case XENMEM_access_w:
+    case XENMEM_access_rw:
+        if ( pfec & PFEC_insn_fetch )
+            req.u.mem_access.flags = MEM_ACCESS_X;
+        break;
+
+    case XENMEM_access_r:
+    case XENMEM_access_n:
+        if ( pfec & PFEC_write_access )
+            req.u.mem_access.flags |= MEM_ACCESS_R | MEM_ACCESS_W;
+        if ( pfec & PFEC_insn_fetch )
+            req.u.mem_access.flags |= MEM_ACCESS_X;
+        break;
+
+    case XENMEM_access_wx:
+    case XENMEM_access_rwx:
+    case XENMEM_access_rx2rw:
+    case XENMEM_access_n2rwx:
+    case XENMEM_access_default:
+        break;
+    }
+
+    if ( !req.u.mem_access.flags )
+        return false; /* no violation */
+
+    if ( kind == npfec_kind_with_gla )
+        req.u.mem_access.flags |= MEM_ACCESS_FAULT_WITH_GLA |
+                                  MEM_ACCESS_GLA_VALID;
+    else if ( kind == npfec_kind_in_gpt )
+        req.u.mem_access.flags |= MEM_ACCESS_FAULT_IN_GPT |
+                                  MEM_ACCESS_GLA_VALID;
+
+
+    req.reason = VM_EVENT_REASON_MEM_ACCESS;
+    req.u.mem_access.gfn = gfn_x(gfn);
+    req.u.mem_access.gla = gla;
+    req.u.mem_access.offset = gpa & ~PAGE_MASK;
+
+    return monitor_traps(curr, true, &req) >= 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/mm/mem_access.c b/xen/arch/x86/mm/mem_access.c
index 0144f92b98..320b9fe621 100644
--- a/xen/arch/x86/mm/mem_access.c
+++ b/xen/arch/x86/mm/mem_access.c
@@ -210,11 +210,18 @@  bool p2m_mem_access_check(paddr_t gpa, unsigned long gla,
             return true;
         }
     }
+
+    /*
+     * Try to avoid sending a mem event. Suppress events caused by page-walks
+     * by emulating but still checking mem_access violations.
+     */
     if ( vm_event_check_ring(d->vm_event_monitor) &&
          d->arch.monitor.inguest_pagefault_disabled &&
-         npfec.kind != npfec_kind_with_gla ) /* don't send a mem_event */
+         npfec.kind == npfec_kind_in_gpt )
     {
+        v->arch.vm_event->send_event = true;
         hvm_emulate_one_vm_event(EMUL_KIND_NORMAL, TRAP_invalid_op, X86_EVENT_NO_EC);
+        v->arch.vm_event->send_event = false;
 
         return true;
     }
diff --git a/xen/arch/x86/mm/shadow/hvm.c b/xen/arch/x86/mm/shadow/hvm.c
index 0aa560b7f5..48dfad4557 100644
--- a/xen/arch/x86/mm/shadow/hvm.c
+++ b/xen/arch/x86/mm/shadow/hvm.c
@@ -139,6 +139,7 @@  hvm_read(enum x86_segment seg,
         return X86EMUL_UNHANDLEABLE;
     case HVMTRANS_gfn_paged_out:
     case HVMTRANS_gfn_shared:
+    case HVMTRANS_need_retry:
         return X86EMUL_RETRY;
     }
 
diff --git a/xen/include/asm-x86/hvm/monitor.h b/xen/include/asm-x86/hvm/monitor.h
index f1af4f812a..325b44674d 100644
--- a/xen/include/asm-x86/hvm/monitor.h
+++ b/xen/include/asm-x86/hvm/monitor.h
@@ -49,6 +49,9 @@  void hvm_monitor_interrupt(unsigned int vector, unsigned int type,
                            unsigned int err, uint64_t cr2);
 bool hvm_monitor_emul_unimplemented(void);
 
+bool hvm_monitor_check_p2m(unsigned long gla, gfn_t gfn, uint32_t pfec,
+                           uint16_t kind);
+
 #endif /* __ASM_X86_HVM_MONITOR_H__ */
 
 /*
diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-x86/hvm/support.h
index e989aa7349..1500e6c94b 100644
--- a/xen/include/asm-x86/hvm/support.h
+++ b/xen/include/asm-x86/hvm/support.h
@@ -61,6 +61,7 @@  enum hvm_translation_result {
     HVMTRANS_unhandleable,
     HVMTRANS_gfn_paged_out,
     HVMTRANS_gfn_shared,
+    HVMTRANS_need_retry,
 };
 
 /*
diff --git a/xen/include/asm-x86/vm_event.h b/xen/include/asm-x86/vm_event.h
index 23e655710b..66db9e1e25 100644
--- a/xen/include/asm-x86/vm_event.h
+++ b/xen/include/asm-x86/vm_event.h
@@ -36,6 +36,8 @@  struct arch_vm_event {
     bool set_gprs;
     /* A sync vm_event has been sent and we're not done handling it. */
     bool sync_event;
+    /* Send mem access events from emulator */
+    bool send_event;
 };
 
 int vm_event_init_domain(struct domain *d);