diff mbox series

[v8] x86/emulate: Send vm_event from emulate

Message ID 20190903140112.18806-1-aisaila@bitdefender.com (mailing list archive)
State Superseded
Headers show
Series [v8] x86/emulate: Send vm_event from emulate | expand

Commit Message

Alexandru Stefan ISAILA Sept. 3, 2019, 2:01 p.m. UTC
A/D bit writes (on page walks) can be considered benign by an introspection
agent, so receiving vm_events for them is a pessimization. We try here to
optimize by filtering these events out.
Currently, we are fully emulating the instruction at RIP when the hardware sees
an EPT fault with npfec.kind != npfec_kind_with_gla. This is, however,
incorrect, because the instruction at RIP might legitimately cause an
EPT fault of its own while accessing a _different_ page from the original one,
where A/D were set.
The solution is to perform the whole emulation, while ignoring EPT restrictions
for the walk part, and taking them into account for the "actual" emulating of
the instruction at RIP. When we send out a vm_event, we don't want the emulation
to complete, since in that case we won't be able to veto whatever it is doing.
That would mean that we can't actually prevent any malicious activity, instead
we'd only be able to report on it.
When we see a "send-vm_event" case while emulating, we need to first send the
event out and then suspend the emulation (return X86EMUL_RETRY).
After the emulation stops we'll call hvm_vm_event_do_resume() again after the
introspection agent treats the event and resumes the guest. There, the
instruction at RIP will be fully emulated (with the EPT ignored) if the
introspection application allows it, and the guest will continue to run past
the instruction.

A common example is if the hardware exits because of an EPT fault caused by a
page walk, p2m_mem_access_check() decides if it is going to send a vm_event.
If the vm_event was sent and it would be treated so it runs the instruction
at RIP, that instruction might also hit a protected page and provoke a vm_event.

Now if npfec.kind != npfec_kind_with_gla and d->arch.monitor.inguest_pagefault_disabled
is true then we are in the page walk case and we can do this emulation optimization
and emulate the page walk while ignoring the EPT, but don't ignore the EPT for the
emulation of the actual instruction.

In the first case we would have 2 EPT events, in the second case we would have
1 EPT event if the instruction at the RIP triggers an EPT event.

We use hvmemul_map_linear_addr() to intercept r/w access and
__hvm_copy() to intercept exec access.

hvm_emulate_send_vm_event() can return false if there was no violation,
if there was an error from monitor_traps() or p2m_get_mem_access().
Returning false if p2m_get_mem_access() fails if the entry was not found
in the EPT in which case it is unrestricted.

NOTE: hvm_emulate_send_vm_event() assumes the caller will check
arch.vm_event->send_event

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>

---
Changes since V7:
	- Change commit message
	- Fix indentation
	- Remove send_event = false
	- Remove goto out from the send_vm_event check
	- Init err with null and remove the err = NULL from
hvmemul_map_linear_addr()
	- Check err after the loop
	- Add assert for send_event in hvm_emulate_send_vm_event()
---
 xen/arch/x86/hvm/emulate.c        | 85 ++++++++++++++++++++++++++++---
 xen/arch/x86/hvm/hvm.c            |  8 +++
 xen/arch/x86/mm/mem_access.c      |  1 +
 xen/include/asm-x86/hvm/emulate.h |  4 ++
 xen/include/asm-x86/vm_event.h    |  2 +
 5 files changed, 94 insertions(+), 6 deletions(-)

Comments

Jan Beulich Sept. 6, 2019, 3:46 p.m. UTC | #1
On 03.09.2019 16:01, Alexandru Stefan ISAILA wrote:
> A/D bit writes (on page walks) can be considered benign by an introspection
> agent, so receiving vm_events for them is a pessimization. We try here to
> optimize by filtering these events out.
> Currently, we are fully emulating the instruction at RIP when the hardware sees
> an EPT fault with npfec.kind != npfec_kind_with_gla. This is, however,
> incorrect, because the instruction at RIP might legitimately cause an
> EPT fault of its own while accessing a _different_ page from the original one,
> where A/D were set.
> The solution is to perform the whole emulation, while ignoring EPT restrictions
> for the walk part, and taking them into account for the "actual" emulating of
> the instruction at RIP. When we send out a vm_event, we don't want the emulation
> to complete, since in that case we won't be able to veto whatever it is doing.
> That would mean that we can't actually prevent any malicious activity, instead
> we'd only be able to report on it.
> When we see a "send-vm_event" case while emulating, we need to first send the
> event out and then suspend the emulation (return X86EMUL_RETRY).
> After the emulation stops we'll call hvm_vm_event_do_resume() again after the
> introspection agent treats the event and resumes the guest. There, the
> instruction at RIP will be fully emulated (with the EPT ignored) if the
> introspection application allows it, and the guest will continue to run past
> the instruction.
> 
> A common example is if the hardware exits because of an EPT fault caused by a
> page walk, p2m_mem_access_check() decides if it is going to send a vm_event.
> If the vm_event was sent and it would be treated so it runs the instruction
> at RIP, that instruction might also hit a protected page and provoke a vm_event.
> 
> Now if npfec.kind != npfec_kind_with_gla and d->arch.monitor.inguest_pagefault_disabled
> is true then we are in the page walk case and we can do this emulation optimization
> and emulate the page walk while ignoring the EPT, but don't ignore the EPT for the
> emulation of the actual instruction.

Instead of comparing against npfec_kind_with_gla, wouldn't you
better compare against npfec_kind_in_gpt to positively identify
page table accesses by the guest? Both VMX and SVM may leave the
field at npfec_kind_unknown after all.

> In the first case we would have 2 EPT events, in the second case we would have
> 1 EPT event if the instruction at the RIP triggers an EPT event.
> 
> We use hvmemul_map_linear_addr() to intercept r/w access and
> __hvm_copy() to intercept exec access.
> 
> hvm_emulate_send_vm_event() can return false if there was no violation,
> if there was an error from monitor_traps() or p2m_get_mem_access().
> Returning false if p2m_get_mem_access() fails if the entry was not found
> in the EPT in which case it is unrestricted.

I'm afraid I can't interpret what the second sentence is supposed to
clarify.

> @@ -531,6 +533,72 @@ static int hvmemul_do_mmio_addr(paddr_t mmio_gpa,
>      return hvmemul_do_io_addr(1, mmio_gpa, reps, size, dir, df, ram_gpa);
>  }
>  
> +/*
> + * Send memory access vm_events based on pfec. Returns true if the event was
> + * sent and false for p2m_get_mem_access() error, no violation and event send
> + * error. Assumes the caller will check arch.vm_event->send_event.
> + *
> + * NOTE: p2m_get_mem_access() can fail if the entry was not found in the EPT
> + * (in which case access to it is unrestricted, so no violations can occur).
> + * In this cases it is fine to continue the emulation.
> + */
> +bool hvm_emulate_send_vm_event(unsigned long gla, gfn_t gfn,
> +                               uint32_t pfec)
> +{
> +    xenmem_access_t access;
> +    vm_event_request_t req = {};
> +    paddr_t gpa = (gfn_to_gaddr(gfn) | (gla & ~PAGE_MASK));
> +
> +    ASSERT(current->arch.vm_event->send_event);
> +
> +    current->arch.vm_event->send_event = false;
> +
> +    if ( p2m_get_mem_access(current->domain, gfn, &access,
> +                            altp2m_vcpu_idx(current)) != 0 )
> +        return false;
> +
> +    switch ( access )
> +    {
> +    case XENMEM_access_x:
> +    case XENMEM_access_rx:
> +        if ( pfec & PFEC_write_access )
> +            req.u.mem_access.flags = MEM_ACCESS_R | MEM_ACCESS_W;
> +        break;
> +
> +    case XENMEM_access_w:
> +    case XENMEM_access_rw:
> +        if ( pfec & PFEC_insn_fetch )
> +            req.u.mem_access.flags = MEM_ACCESS_X;
> +        break;
> +
> +    case XENMEM_access_r:
> +    case XENMEM_access_n:
> +        if ( pfec & PFEC_write_access )
> +            req.u.mem_access.flags |= MEM_ACCESS_R | MEM_ACCESS_W;
> +        if ( pfec & PFEC_insn_fetch )
> +            req.u.mem_access.flags |= MEM_ACCESS_X;
> +        break;
> +
> +    case XENMEM_access_wx:
> +    case XENMEM_access_rwx:
> +    case XENMEM_access_rx2rw:
> +    case XENMEM_access_n2rwx:
> +    case XENMEM_access_default:
> +        break;
> +    }
> +
> +    if ( !req.u.mem_access.flags )
> +        return false; /* no violation */
> +
> +    req.reason = VM_EVENT_REASON_MEM_ACCESS;
> +    req.u.mem_access.gfn = gfn_x(gfn);
> +    req.u.mem_access.flags |= MEM_ACCESS_FAULT_WITH_GLA | MEM_ACCESS_GLA_VALID;
> +    req.u.mem_access.gla = gla;
> +    req.u.mem_access.offset = gpa & ~PAGE_MASK;
> +
> +    return monitor_traps(current, true, &req) >= 0;
> +}

Since it's non-static anyway, wouldn't this better live in
hvm/vm_event.c, putting it under VM EVENT maintainership? In any
event I don't feel qualified eventually giving an ack for this
function.

> @@ -544,10 +612,11 @@ static void *hvmemul_map_linear_addr(
>      struct hvm_emulate_ctxt *hvmemul_ctxt)
>  {
>      struct vcpu *curr = current;
> -    void *err, *mapping;
> +    void *err = NULL, *mapping;
>      unsigned int nr_frames = ((linear + bytes - !!bytes) >> PAGE_SHIFT) -
>          (linear >> PAGE_SHIFT) + 1;
>      unsigned int i;
> +    gfn_t gfn;
>  
>      /*
>       * mfn points to the next free slot.  All used slots have a page reference
> @@ -582,7 +651,7 @@ static void *hvmemul_map_linear_addr(
>          ASSERT(mfn_x(*mfn) == 0);
>  
>          res = hvm_translate_get_page(curr, addr, true, pfec,
> -                                     &pfinfo, &page, NULL, &p2mt);
> +                                     &pfinfo, &page, &gfn, &p2mt);
>  
>          switch ( res )
>          {
> @@ -596,7 +665,6 @@ static void *hvmemul_map_linear_addr(
>              goto out;
>  
>          case HVMTRANS_bad_gfn_to_mfn:
> -            err = NULL;
>              goto out;
>  
>          case HVMTRANS_gfn_paged_out:
> @@ -619,14 +687,19 @@ static void *hvmemul_map_linear_addr(
>              }
>  
>              if ( p2mt == p2m_ioreq_server )
> -            {
> -                err = NULL;
>                  goto out;
> -            }
>  
>              ASSERT(p2mt == p2m_ram_logdirty || !p2m_is_readonly(p2mt));
>          }
> +
> +        if ( curr->arch.vm_event &&

Just like you have in __hvm_copy(), may I ask for unlikely() here?

> +             curr->arch.vm_event->send_event &&
> +             hvm_emulate_send_vm_event(addr, gfn, pfec) )
> +            err = ERR_PTR(~X86EMUL_RETRY);

In the description you say this takes care of r/w, and __hvm_copy()
takes care of exec, but the function we're in doesn't get called
on the read path (but it may be down the road). Additionally the
function now also gets called from hvmemul_cache_op(), which I'm
not sure you actually want to send events for.

>      }
> +    /* Check if eny vm_event was sent */

"any" and please add blank line ahead of your addition.

> +    if ( err )
> +        goto out;

And wait - why does this sit after the loop? Is that a re-basing
mistake from when you had put on top of my uncommitted patch?

> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -3224,6 +3224,14 @@ static enum hvm_translation_result __hvm_copy(
>              return HVMTRANS_bad_gfn_to_mfn;
>          }
>  
> +        if ( unlikely(v->arch.vm_event) &&
> +             v->arch.vm_event->send_event &&
> +             hvm_emulate_send_vm_event(addr, gfn, pfec) )
> +        {
> +            put_page(page);
> +            return HVMTRANS_gfn_paged_out;
> +        }

I'm pretty sure I had previously asked for there to either be
a change to the return value, or for there to be a comment
justifying the (apparent) abuse of the one used.

Jan
Alexandru Stefan ISAILA Sept. 9, 2019, 10:01 a.m. UTC | #2
On 06.09.2019 18:46, Jan Beulich wrote:
> On 03.09.2019 16:01, Alexandru Stefan ISAILA wrote:
>> A/D bit writes (on page walks) can be considered benign by an introspection
>> agent, so receiving vm_events for them is a pessimization. We try here to
>> optimize by filtering these events out.
>> Currently, we are fully emulating the instruction at RIP when the hardware sees
>> an EPT fault with npfec.kind != npfec_kind_with_gla. This is, however,
>> incorrect, because the instruction at RIP might legitimately cause an
>> EPT fault of its own while accessing a _different_ page from the original one,
>> where A/D were set.
>> The solution is to perform the whole emulation, while ignoring EPT restrictions
>> for the walk part, and taking them into account for the "actual" emulating of
>> the instruction at RIP. When we send out a vm_event, we don't want the emulation
>> to complete, since in that case we won't be able to veto whatever it is doing.
>> That would mean that we can't actually prevent any malicious activity, instead
>> we'd only be able to report on it.
>> When we see a "send-vm_event" case while emulating, we need to first send the
>> event out and then suspend the emulation (return X86EMUL_RETRY).
>> After the emulation stops we'll call hvm_vm_event_do_resume() again after the
>> introspection agent treats the event and resumes the guest. There, the
>> instruction at RIP will be fully emulated (with the EPT ignored) if the
>> introspection application allows it, and the guest will continue to run past
>> the instruction.
>>
>> A common example is if the hardware exits because of an EPT fault caused by a
>> page walk, p2m_mem_access_check() decides if it is going to send a vm_event.
>> If the vm_event was sent and it would be treated so it runs the instruction
>> at RIP, that instruction might also hit a protected page and provoke a vm_event.
>>
>> Now if npfec.kind != npfec_kind_with_gla and d->arch.monitor.inguest_pagefault_disabled
>> is true then we are in the page walk case and we can do this emulation optimization
>> and emulate the page walk while ignoring the EPT, but don't ignore the EPT for the
>> emulation of the actual instruction.
> 
> Instead of comparing against npfec_kind_with_gla, wouldn't you
> better compare against npfec_kind_in_gpt to positively identify
> page table accesses by the guest? Both VMX and SVM may leave the
> field at npfec_kind_unknown after all.

I think it is safer to check against npfec_kind_in_gpt. I have to run 
some tests with this but it looks good from the code.

> 
>> In the first case we would have 2 EPT events, in the second case we would have
>> 1 EPT event if the instruction at the RIP triggers an EPT event.
>>
>> We use hvmemul_map_linear_addr() to intercept r/w access and
>> __hvm_copy() to intercept exec access.
>>
>> hvm_emulate_send_vm_event() can return false if there was no violation,
>> if there was an error from monitor_traps() or p2m_get_mem_access().
>> Returning false if p2m_get_mem_access() fails if the entry was not found
>> in the EPT in which case it is unrestricted.
> 
> I'm afraid I can't interpret what the second sentence is supposed to
> clarify.

It wants to clarify why returning false, if p2m_get_mem_access() fails, 
is needed in hvm_emulate_send_vm_event(). It is sort of unclear and it 
has to be reformulated like

"Returning false if p2m_get_mem_access() fails is needed because the EPT 
entry will have rwx memory access rights."

> 
>> @@ -531,6 +533,72 @@ static int hvmemul_do_mmio_addr(paddr_t mmio_gpa,
>>       return hvmemul_do_io_addr(1, mmio_gpa, reps, size, dir, df, ram_gpa);
>>   }
>>   
>> +/*
>> + * Send memory access vm_events based on pfec. Returns true if the event was
>> + * sent and false for p2m_get_mem_access() error, no violation and event send
>> + * error. Assumes the caller will check arch.vm_event->send_event.
>> + *
>> + * NOTE: p2m_get_mem_access() can fail if the entry was not found in the EPT
>> + * (in which case access to it is unrestricted, so no violations can occur).
>> + * In this cases it is fine to continue the emulation.
>> + */
>> +bool hvm_emulate_send_vm_event(unsigned long gla, gfn_t gfn,
>> +                               uint32_t pfec)
>> +{
>> +    xenmem_access_t access;
>> +    vm_event_request_t req = {};
>> +    paddr_t gpa = (gfn_to_gaddr(gfn) | (gla & ~PAGE_MASK));
>> +
>> +    ASSERT(current->arch.vm_event->send_event);
>> +
>> +    current->arch.vm_event->send_event = false;
>> +
>> +    if ( p2m_get_mem_access(current->domain, gfn, &access,
>> +                            altp2m_vcpu_idx(current)) != 0 )
>> +        return false;
>> +
>> +    switch ( access )
>> +    {
>> +    case XENMEM_access_x:
>> +    case XENMEM_access_rx:
>> +        if ( pfec & PFEC_write_access )
>> +            req.u.mem_access.flags = MEM_ACCESS_R | MEM_ACCESS_W;
>> +        break;
>> +
>> +    case XENMEM_access_w:
>> +    case XENMEM_access_rw:
>> +        if ( pfec & PFEC_insn_fetch )
>> +            req.u.mem_access.flags = MEM_ACCESS_X;
>> +        break;
>> +
>> +    case XENMEM_access_r:
>> +    case XENMEM_access_n:
>> +        if ( pfec & PFEC_write_access )
>> +            req.u.mem_access.flags |= MEM_ACCESS_R | MEM_ACCESS_W;
>> +        if ( pfec & PFEC_insn_fetch )
>> +            req.u.mem_access.flags |= MEM_ACCESS_X;
>> +        break;
>> +
>> +    case XENMEM_access_wx:
>> +    case XENMEM_access_rwx:
>> +    case XENMEM_access_rx2rw:
>> +    case XENMEM_access_n2rwx:
>> +    case XENMEM_access_default:
>> +        break;
>> +    }
>> +
>> +    if ( !req.u.mem_access.flags )
>> +        return false; /* no violation */
>> +
>> +    req.reason = VM_EVENT_REASON_MEM_ACCESS;
>> +    req.u.mem_access.gfn = gfn_x(gfn);
>> +    req.u.mem_access.flags |= MEM_ACCESS_FAULT_WITH_GLA | MEM_ACCESS_GLA_VALID;
>> +    req.u.mem_access.gla = gla;
>> +    req.u.mem_access.offset = gpa & ~PAGE_MASK;
>> +
>> +    return monitor_traps(current, true, &req) >= 0;
>> +}
> 
> Since it's non-static anyway, wouldn't this better live in
> hvm/vm_event.c, putting it under VM EVENT maintainership? In any
> event I don't feel qualified eventually giving an ack for this
> function.

I agree, I will move it in vm_event.c.

> 
>> @@ -544,10 +612,11 @@ static void *hvmemul_map_linear_addr(
>>       struct hvm_emulate_ctxt *hvmemul_ctxt)
>>   {
>>       struct vcpu *curr = current;
>> -    void *err, *mapping;
>> +    void *err = NULL, *mapping;
>>       unsigned int nr_frames = ((linear + bytes - !!bytes) >> PAGE_SHIFT) -
>>           (linear >> PAGE_SHIFT) + 1;
>>       unsigned int i;
>> +    gfn_t gfn;
>>   
>>       /*
>>        * mfn points to the next free slot.  All used slots have a page reference
>> @@ -582,7 +651,7 @@ static void *hvmemul_map_linear_addr(
>>           ASSERT(mfn_x(*mfn) == 0);
>>   
>>           res = hvm_translate_get_page(curr, addr, true, pfec,
>> -                                     &pfinfo, &page, NULL, &p2mt);
>> +                                     &pfinfo, &page, &gfn, &p2mt);
>>   
>>           switch ( res )
>>           {
>> @@ -596,7 +665,6 @@ static void *hvmemul_map_linear_addr(
>>               goto out;
>>   
>>           case HVMTRANS_bad_gfn_to_mfn:
>> -            err = NULL;
>>               goto out;
>>   
>>           case HVMTRANS_gfn_paged_out:
>> @@ -619,14 +687,19 @@ static void *hvmemul_map_linear_addr(
>>               }
>>   
>>               if ( p2mt == p2m_ioreq_server )
>> -            {
>> -                err = NULL;
>>                   goto out;
>> -            }
>>   
>>               ASSERT(p2mt == p2m_ram_logdirty || !p2m_is_readonly(p2mt));
>>           }
>> +
>> +        if ( curr->arch.vm_event &&
> 
> Just like you have in __hvm_copy(), may I ask for unlikely() here?

I will add this.

> 
>> +             curr->arch.vm_event->send_event &&
>> +             hvm_emulate_send_vm_event(addr, gfn, pfec) )
>> +            err = ERR_PTR(~X86EMUL_RETRY);
> 
> In the description you say this takes care of r/w, and __hvm_copy()
> takes care of exec, but the function we're in doesn't get called
> on the read path (but it may be down the road). Additionally the
> function now also gets called from hvmemul_cache_op(), which I'm
> not sure you actually want to send events for.

I was aiming for hvmemul_write/hvmemul_rmw but if a access error occurs 
when the option (curr->arch.vm_event->send_event) is active, IMO it is 
ok to send a vm_event. Doesn't matter where it originated.

> 
>>       }
>> +    /* Check if eny vm_event was sent */
> 
> "any" and please add blank line ahead of your addition.

I will correct this.

> 
>> +    if ( err )
>> +        goto out;
> 
> And wait - why does this sit after the loop? Is that a re-basing
> mistake from when you had put on top of my uncommitted patch?

This is done to skip the mapping part down the line. If there is an 
error then we have to return _it_ and not the mapping.

> 
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -3224,6 +3224,14 @@ static enum hvm_translation_result __hvm_copy(
>>               return HVMTRANS_bad_gfn_to_mfn;
>>           }
>>   
>> +        if ( unlikely(v->arch.vm_event) &&
>> +             v->arch.vm_event->send_event &&
>> +             hvm_emulate_send_vm_event(addr, gfn, pfec) )
>> +        {
>> +            put_page(page);
>> +            return HVMTRANS_gfn_paged_out;
>> +        }
> 
> I'm pretty sure I had previously asked for there to either be
> a change to the return value, or for there to be a comment
> justifying the (apparent) abuse of the one used.

Sorry for not clarifying this in the last version. I looked now and I 
missed to respond. I return  HVMTRANS_gfn_paged_out so that later it 
will be treated and return X86EMUL_RETRY.
Jan Beulich Sept. 9, 2019, 10:49 a.m. UTC | #3
On 09.09.2019 12:01, Alexandru Stefan ISAILA wrote:
> On 06.09.2019 18:46, Jan Beulich wrote:
>> On 03.09.2019 16:01, Alexandru Stefan ISAILA wrote:
>>>       }
>>> +    /* Check if eny vm_event was sent */
>>
>> "any" and please add blank line ahead of your addition.
> 
> I will correct this.
> 
>>
>>> +    if ( err )
>>> +        goto out;
>>
>> And wait - why does this sit after the loop? Is that a re-basing
>> mistake from when you had put on top of my uncommitted patch?
> 
> This is done to skip the mapping part down the line. If there is an 
> error then we have to return _it_ and not the mapping.

But after re-basing you could (and hence imo should) "goto out"
right from the code blob you add to the loop. Which would then
also eliminate the need for other "err" related adjustments you
make.

Jan
Alexandru Stefan ISAILA Sept. 9, 2019, 11:03 a.m. UTC | #4
On 09.09.2019 13:49, Jan Beulich wrote:
> On 09.09.2019 12:01, Alexandru Stefan ISAILA wrote:
>> On 06.09.2019 18:46, Jan Beulich wrote:
>>> On 03.09.2019 16:01, Alexandru Stefan ISAILA wrote:
>>>>        }
>>>> +    /* Check if eny vm_event was sent */
>>>
>>> "any" and please add blank line ahead of your addition.
>>
>> I will correct this.
>>
>>>
>>>> +    if ( err )
>>>> +        goto out;
>>>
>>> And wait - why does this sit after the loop? Is that a re-basing
>>> mistake from when you had put on top of my uncommitted patch?
>>
>> This is done to skip the mapping part down the line. If there is an
>> error then we have to return _it_ and not the mapping.
> 
> But after re-basing you could (and hence imo should) "goto out"
> right from the code blob you add to the loop. Which would then
> also eliminate the need for other "err" related adjustments you
> make.
> 

So you want me to have this patch base on your (x86/HVM: correct 
hvmemul_map_linear_addr() for multi-page case) patch. Ok, I can do that 
in the next version but then it will make the patches dependent and mine 
will have to go in after all is resolved in your patch and that could 
take some time.

Could I propose to have the goto moved in the loop and when your patch 
is ready to merge just replace the err = ERR_PTR(~X86EMUL_RETRY); with
update_map_err(err, ~X86EMUL_RETRY); ?

Thanks,
Alex
Jan Beulich Sept. 9, 2019, 11:15 a.m. UTC | #5
On 09.09.2019 13:03, Alexandru Stefan ISAILA wrote:
> 
> 
> On 09.09.2019 13:49, Jan Beulich wrote:
>> On 09.09.2019 12:01, Alexandru Stefan ISAILA wrote:
>>> On 06.09.2019 18:46, Jan Beulich wrote:
>>>> On 03.09.2019 16:01, Alexandru Stefan ISAILA wrote:
>>>>>        }
>>>>> +    /* Check if eny vm_event was sent */
>>>>
>>>> "any" and please add blank line ahead of your addition.
>>>
>>> I will correct this.
>>>
>>>>
>>>>> +    if ( err )
>>>>> +        goto out;
>>>>
>>>> And wait - why does this sit after the loop? Is that a re-basing
>>>> mistake from when you had put on top of my uncommitted patch?
>>>
>>> This is done to skip the mapping part down the line. If there is an
>>> error then we have to return _it_ and not the mapping.
>>
>> But after re-basing you could (and hence imo should) "goto out"
>> right from the code blob you add to the loop. Which would then
>> also eliminate the need for other "err" related adjustments you
>> make.
>>
> 
> So you want me to have this patch base on your (x86/HVM: correct 
> hvmemul_map_linear_addr() for multi-page case) patch.

No, explicitly not. The "re-basing" I used above was referring to
you have moved away from basing your patch on mine.

Jan
Alexandru Stefan ISAILA Sept. 9, 2019, 11:23 a.m. UTC | #6
On 09.09.2019 14:15, Jan Beulich wrote:
> On 09.09.2019 13:03, Alexandru Stefan ISAILA wrote:
>>
>>
>> On 09.09.2019 13:49, Jan Beulich wrote:
>>> On 09.09.2019 12:01, Alexandru Stefan ISAILA wrote:
>>>> On 06.09.2019 18:46, Jan Beulich wrote:
>>>>> On 03.09.2019 16:01, Alexandru Stefan ISAILA wrote:
>>>>>>         }
>>>>>> +    /* Check if eny vm_event was sent */
>>>>>
>>>>> "any" and please add blank line ahead of your addition.
>>>>
>>>> I will correct this.
>>>>
>>>>>
>>>>>> +    if ( err )
>>>>>> +        goto out;
>>>>>
>>>>> And wait - why does this sit after the loop? Is that a re-basing
>>>>> mistake from when you had put on top of my uncommitted patch?
>>>>
>>>> This is done to skip the mapping part down the line. If there is an
>>>> error then we have to return _it_ and not the mapping.
>>>
>>> But after re-basing you could (and hence imo should) "goto out"
>>> right from the code blob you add to the loop. Which would then
>>> also eliminate the need for other "err" related adjustments you
>>> make.
>>>
>>
>> So you want me to have this patch base on your (x86/HVM: correct
>> hvmemul_map_linear_addr() for multi-page case) patch.
> 
> No, explicitly not. The "re-basing" I used above was referring to
> you have moved away from basing your patch on mine.
> 

Ok, I misunderstood, thanks for the explanation, I will have the patch 
modified as I posted.

Thanks for the review,
Alex
diff mbox series

Patch

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index e4b3f330a8..4ab113ef58 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -12,9 +12,11 @@ 
 #include <xen/init.h>
 #include <xen/lib.h>
 #include <xen/sched.h>
+#include <xen/monitor.h>
 #include <xen/paging.h>
 #include <xen/trace.h>
 #include <xen/vm_event.h>
+#include <asm/altp2m.h>
 #include <asm/event.h>
 #include <asm/i387.h>
 #include <asm/xstate.h>
@@ -531,6 +533,72 @@  static int hvmemul_do_mmio_addr(paddr_t mmio_gpa,
     return hvmemul_do_io_addr(1, mmio_gpa, reps, size, dir, df, ram_gpa);
 }
 
+/*
+ * Send memory access vm_events based on pfec. Returns true if the event was
+ * sent and false for p2m_get_mem_access() error, no violation and event send
+ * error. Assumes the caller will check arch.vm_event->send_event.
+ *
+ * NOTE: p2m_get_mem_access() can fail if the entry was not found in the EPT
+ * (in which case access to it is unrestricted, so no violations can occur).
+ * In this cases it is fine to continue the emulation.
+ */
+bool hvm_emulate_send_vm_event(unsigned long gla, gfn_t gfn,
+                               uint32_t pfec)
+{
+    xenmem_access_t access;
+    vm_event_request_t req = {};
+    paddr_t gpa = (gfn_to_gaddr(gfn) | (gla & ~PAGE_MASK));
+
+    ASSERT(current->arch.vm_event->send_event);
+
+    current->arch.vm_event->send_event = false;
+
+    if ( p2m_get_mem_access(current->domain, gfn, &access,
+                            altp2m_vcpu_idx(current)) != 0 )
+        return false;
+
+    switch ( access )
+    {
+    case XENMEM_access_x:
+    case XENMEM_access_rx:
+        if ( pfec & PFEC_write_access )
+            req.u.mem_access.flags = MEM_ACCESS_R | MEM_ACCESS_W;
+        break;
+
+    case XENMEM_access_w:
+    case XENMEM_access_rw:
+        if ( pfec & PFEC_insn_fetch )
+            req.u.mem_access.flags = MEM_ACCESS_X;
+        break;
+
+    case XENMEM_access_r:
+    case XENMEM_access_n:
+        if ( pfec & PFEC_write_access )
+            req.u.mem_access.flags |= MEM_ACCESS_R | MEM_ACCESS_W;
+        if ( pfec & PFEC_insn_fetch )
+            req.u.mem_access.flags |= MEM_ACCESS_X;
+        break;
+
+    case XENMEM_access_wx:
+    case XENMEM_access_rwx:
+    case XENMEM_access_rx2rw:
+    case XENMEM_access_n2rwx:
+    case XENMEM_access_default:
+        break;
+    }
+
+    if ( !req.u.mem_access.flags )
+        return false; /* no violation */
+
+    req.reason = VM_EVENT_REASON_MEM_ACCESS;
+    req.u.mem_access.gfn = gfn_x(gfn);
+    req.u.mem_access.flags |= MEM_ACCESS_FAULT_WITH_GLA | MEM_ACCESS_GLA_VALID;
+    req.u.mem_access.gla = gla;
+    req.u.mem_access.offset = gpa & ~PAGE_MASK;
+
+    return monitor_traps(current, true, &req) >= 0;
+}
+
 /*
  * Map the frame(s) covering an individual linear access, for writeable
  * access.  May return NULL for MMIO, or ERR_PTR(~X86EMUL_*) for other errors
@@ -544,10 +612,11 @@  static void *hvmemul_map_linear_addr(
     struct hvm_emulate_ctxt *hvmemul_ctxt)
 {
     struct vcpu *curr = current;
-    void *err, *mapping;
+    void *err = NULL, *mapping;
     unsigned int nr_frames = ((linear + bytes - !!bytes) >> PAGE_SHIFT) -
         (linear >> PAGE_SHIFT) + 1;
     unsigned int i;
+    gfn_t gfn;
 
     /*
      * mfn points to the next free slot.  All used slots have a page reference
@@ -582,7 +651,7 @@  static void *hvmemul_map_linear_addr(
         ASSERT(mfn_x(*mfn) == 0);
 
         res = hvm_translate_get_page(curr, addr, true, pfec,
-                                     &pfinfo, &page, NULL, &p2mt);
+                                     &pfinfo, &page, &gfn, &p2mt);
 
         switch ( res )
         {
@@ -596,7 +665,6 @@  static void *hvmemul_map_linear_addr(
             goto out;
 
         case HVMTRANS_bad_gfn_to_mfn:
-            err = NULL;
             goto out;
 
         case HVMTRANS_gfn_paged_out:
@@ -619,14 +687,19 @@  static void *hvmemul_map_linear_addr(
             }
 
             if ( p2mt == p2m_ioreq_server )
-            {
-                err = NULL;
                 goto out;
-            }
 
             ASSERT(p2mt == p2m_ram_logdirty || !p2m_is_readonly(p2mt));
         }
+
+        if ( curr->arch.vm_event &&
+             curr->arch.vm_event->send_event &&
+             hvm_emulate_send_vm_event(addr, gfn, pfec) )
+            err = ERR_PTR(~X86EMUL_RETRY);
     }
+    /* Check if eny vm_event was sent */
+    if ( err )
+        goto out;
 
     /* Entire access within a single frame? */
     if ( nr_frames == 1 )
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 2b8189946b..727f195359 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3224,6 +3224,14 @@  static enum hvm_translation_result __hvm_copy(
             return HVMTRANS_bad_gfn_to_mfn;
         }
 
+        if ( unlikely(v->arch.vm_event) &&
+             v->arch.vm_event->send_event &&
+             hvm_emulate_send_vm_event(addr, gfn, pfec) )
+        {
+            put_page(page);
+            return HVMTRANS_gfn_paged_out;
+        }
+
         p = (char *)__map_domain_page(page) + (addr & ~PAGE_MASK);
 
         if ( flags & HVMCOPY_to_guest )
diff --git a/xen/arch/x86/mm/mem_access.c b/xen/arch/x86/mm/mem_access.c
index 0144f92b98..c0faa57db1 100644
--- a/xen/arch/x86/mm/mem_access.c
+++ b/xen/arch/x86/mm/mem_access.c
@@ -214,6 +214,7 @@  bool p2m_mem_access_check(paddr_t gpa, unsigned long gla,
          d->arch.monitor.inguest_pagefault_disabled &&
          npfec.kind != npfec_kind_with_gla ) /* don't send a mem_event */
     {
+        v->arch.vm_event->send_event = true;
         hvm_emulate_one_vm_event(EMUL_KIND_NORMAL, TRAP_invalid_op, X86_EVENT_NO_EC);
 
         return true;
diff --git a/xen/include/asm-x86/hvm/emulate.h b/xen/include/asm-x86/hvm/emulate.h
index b39a1a0331..3682efd90b 100644
--- a/xen/include/asm-x86/hvm/emulate.h
+++ b/xen/include/asm-x86/hvm/emulate.h
@@ -80,6 +80,10 @@  struct segment_register *hvmemul_get_seg_reg(
     enum x86_segment seg,
     struct hvm_emulate_ctxt *hvmemul_ctxt);
 int hvm_emulate_one_mmio(unsigned long mfn, unsigned long gla);
+bool hvm_emulate_send_vm_event(
+    unsigned long gla,
+    gfn_t gfn,
+    uint32_t pfec);
 
 static inline bool handle_mmio(void)
 {
diff --git a/xen/include/asm-x86/vm_event.h b/xen/include/asm-x86/vm_event.h
index 23e655710b..66db9e1e25 100644
--- a/xen/include/asm-x86/vm_event.h
+++ b/xen/include/asm-x86/vm_event.h
@@ -36,6 +36,8 @@  struct arch_vm_event {
     bool set_gprs;
     /* A sync vm_event has been sent and we're not done handling it. */
     bool sync_event;
+    /* Send mem access events from emulator */
+    bool send_event;
 };
 
 int vm_event_init_domain(struct domain *d);