diff mbox series

[V2,21/23] xen/arm: Add mapcache invalidation handling

Message ID 1602780274-29141-22-git-send-email-olekstysh@gmail.com (mailing list archive)
State New, archived
Headers show
Series IOREQ feature (+ virtio-mmio) on Arm | expand

Commit Message

Oleksandr Tyshchenko Oct. 15, 2020, 4:44 p.m. UTC
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

We need to send mapcache invalidation request to qemu/demu everytime
the page gets removed from a guest.

At the moment, the Arm code doesn't explicitely remove the existing
mapping before inserting the new mapping. Instead, this is done
implicitely by __p2m_set_entry().

So the corresponding flag will be set in __p2m_set_entry() if old entry
is a RAM page *and* the new MFN is different. And the invalidation
request will be sent in do_trap_hypercall() later on.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
CC: Julien Grall <julien.grall@arm.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes V1 -> V2:
   - new patch, some changes were derived from (+ new explanation):
     xen/ioreq: Make x86's invalidate qemu mapcache handling common
   - put setting of the flag into __p2m_set_entry()
   - clarify the conditions when the flag should be set
   - use domain_has_ioreq_server()
   - update do_trap_hypercall() by adding local variable
---
 xen/arch/arm/p2m.c   |  8 ++++++++
 xen/arch/arm/traps.c | 13 ++++++++++---
 2 files changed, 18 insertions(+), 3 deletions(-)

Comments

Jan Beulich Oct. 16, 2020, 6:29 a.m. UTC | #1
On 15.10.2020 18:44, Oleksandr Tyshchenko wrote:
> @@ -1067,7 +1068,14 @@ static int __p2m_set_entry(struct p2m_domain *p2m,
>       */
>      if ( p2m_is_valid(orig_pte) &&
>           !mfn_eq(lpae_get_mfn(*entry), lpae_get_mfn(orig_pte)) )
> +    {
> +#ifdef CONFIG_IOREQ_SERVER
> +        if ( domain_has_ioreq_server(p2m->domain) &&
> +             (p2m->domain == current->domain) && p2m_is_ram(orig_pte.p2m.type) )
> +            p2m->domain->qemu_mapcache_invalidate = true;
> +#endif
>          p2m_free_entry(p2m, orig_pte, level);
> +    }

For all I have to say here, please bear in mind that I don't know
the internals of Arm memory management.

The first odd thing here the merely MFN-based condition. It may
well be that's sufficient, if there's no way to get a "not present"
entry with an MFN matching any valid MFN. (This isn't just with
your addition, but even before.)

Given how p2m_free_entry() works (or is supposed to work in the
long run), is the new code you add guaranteed to only alter leaf
entries? If not, the freeing of page tables needs deferring until
after qemu has dropped its mappings.

And with there being refcounting only for foreign pages, how do
you prevent the freeing of the page just unmapped before qemu has
dropped its possible mapping? On the x86 side this problem is one
of the reasons why PVH Dom0 isn't "supported", yet. At least a
respective code comment would seem advisable, so the issue to be
addressed won't be forgotten.

> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -1442,6 +1442,7 @@ static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr,
>                                const union hsr hsr)
>  {
>      arm_hypercall_fn_t call = NULL;
> +    struct vcpu *v = current;

This ought to be named "curr".

Jan
Julien Grall Oct. 16, 2020, 8:41 a.m. UTC | #2
Hi Jan,

On 16/10/2020 07:29, Jan Beulich wrote:
> On 15.10.2020 18:44, Oleksandr Tyshchenko wrote:
>> @@ -1067,7 +1068,14 @@ static int __p2m_set_entry(struct p2m_domain *p2m,
>>        */
>>       if ( p2m_is_valid(orig_pte) &&
>>            !mfn_eq(lpae_get_mfn(*entry), lpae_get_mfn(orig_pte)) )
>> +    {
>> +#ifdef CONFIG_IOREQ_SERVER
>> +        if ( domain_has_ioreq_server(p2m->domain) &&
>> +             (p2m->domain == current->domain) && p2m_is_ram(orig_pte.p2m.type) )
>> +            p2m->domain->qemu_mapcache_invalidate = true;
>> +#endif
>>           p2m_free_entry(p2m, orig_pte, level);
>> +    }
> 
> For all I have to say here, please bear in mind that I don't know
> the internals of Arm memory management.
> 
> The first odd thing here the merely MFN-based condition. It may
> well be that's sufficient, if there's no way to get a "not present"
> entry with an MFN matching any valid MFN. (This isn't just with
> your addition, but even before.
Invalid entries are always zeroed. So in theory the problem could arise 
if MFN 0 used in the guest. It should not be possible on staging, but I 
agree this should be fixed.

> 
> Given how p2m_free_entry() works (or is supposed to work in the
> long run), is the new code you add guaranteed to only alter leaf
> entries?

This path may also be called with tables. I think we want to move the 
check in p2m_free_entry() so we can find the correct leaf type.

> If not, the freeing of page tables needs deferring until
> after qemu has dropped its mappings.

Freeing the page tables doesn't release a page. So may I ask why we 
would need to defer it?

> 
> And with there being refcounting only for foreign pages, how do
> you prevent the freeing of the page just unmapped before qemu has
> dropped its possible mapping?
QEMU mappings can only be done using the foreign mapping interface. This 
means that page reference count will be incremented for each QEMU 
mappings. Therefore the page cannot disappear until QEMU dropped the 
last reference.

> On the x86 side this problem is one
> of the reasons why PVH Dom0 isn't "supported", yet. At least a
> respective code comment would seem advisable, so the issue to be
> addressed won't be forgotten.

Are you sure? Isn't because you don't take a reference on foreign pages 
while mapping it?

Anyway, Arm has supported foreign mapping since its inception. So if 
there is a bug, then it should be fixed.

Cheers,
Jan Beulich Oct. 16, 2020, 8:56 a.m. UTC | #3
On 16.10.2020 10:41, Julien Grall wrote:
> On 16/10/2020 07:29, Jan Beulich wrote:
>> Given how p2m_free_entry() works (or is supposed to work in the
>> long run), is the new code you add guaranteed to only alter leaf
>> entries?
> 
> This path may also be called with tables. I think we want to move the 
> check in p2m_free_entry() so we can find the correct leaf type.
> 
>> If not, the freeing of page tables needs deferring until
>> after qemu has dropped its mappings.
> 
> Freeing the page tables doesn't release a page. So may I ask why we 
> would need to defer it?

Oh, sorry - qemu of course doesn't use the same p2m, so the
intermediate page tables are private to the subject guest.

>> And with there being refcounting only for foreign pages, how do
>> you prevent the freeing of the page just unmapped before qemu has
>> dropped its possible mapping?
> QEMU mappings can only be done using the foreign mapping interface. This 
> means that page reference count will be incremented for each QEMU 
> mappings. Therefore the page cannot disappear until QEMU dropped the 
> last reference.

Okay, sorry for the noise then.

Jan
Oleksandr Tyshchenko Nov. 11, 2020, 12:03 a.m. UTC | #4
On 16.10.20 11:41, Julien Grall wrote:

Hi Jan, Julien

Sorry for the late response.

> Hi Jan,
>
> On 16/10/2020 07:29, Jan Beulich wrote:
>> On 15.10.2020 18:44, Oleksandr Tyshchenko wrote:
>>> @@ -1067,7 +1068,14 @@ static int __p2m_set_entry(struct p2m_domain 
>>> *p2m,
>>>        */
>>>       if ( p2m_is_valid(orig_pte) &&
>>>            !mfn_eq(lpae_get_mfn(*entry), lpae_get_mfn(orig_pte)) )
>>> +    {
>>> +#ifdef CONFIG_IOREQ_SERVER
>>> +        if ( domain_has_ioreq_server(p2m->domain) &&
>>> +             (p2m->domain == current->domain) && 
>>> p2m_is_ram(orig_pte.p2m.type) )
>>> +            p2m->domain->qemu_mapcache_invalidate = true;
>>> +#endif
>>>           p2m_free_entry(p2m, orig_pte, level);
>>> +    }
>>
>> For all I have to say here, please bear in mind that I don't know
>> the internals of Arm memory management.
>>
>> The first odd thing here the merely MFN-based condition. It may
>> well be that's sufficient, if there's no way to get a "not present"
>> entry with an MFN matching any valid MFN. (This isn't just with
>> your addition, but even before.
> Invalid entries are always zeroed. So in theory the problem could 
> arise if MFN 0 used in the guest. It should not be possible on 
> staging, but I agree this should be fixed.
>
>>
>> Given how p2m_free_entry() works (or is supposed to work in the
>> long run), is the new code you add guaranteed to only alter leaf
>> entries?
>
> This path may also be called with tables. I think we want to move the 
> check in p2m_free_entry() so we can find the correct leaf type.

Well, but inside p2m_free_entry() we don't have a new entry in order to 
check whether new MFN is actually different. An extra arg only comes to 
mind...

[Didn't update call sites yet and didn't tested]

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 5bb23df..4001f46 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -739,7 +739,7 @@ static void p2m_put_l3_page(const lpae_t pte)

  /* Free lpae sub-tree behind an entry */
  static void p2m_free_entry(struct p2m_domain *p2m,
-                           lpae_t entry, unsigned int level)
+                           lpae_t entry, lpae_t new_entry, unsigned int 
level)
  {
      unsigned int i;
      lpae_t *table;
@@ -750,17 +750,19 @@ static void p2m_free_entry(struct p2m_domain *p2m,
      if ( !p2m_is_valid(entry) )
          return;

-    /* Nothing to do but updating the stats if the entry is a 
super-page. */
-    if ( p2m_is_superpage(entry, level) )
+    if ( p2m_is_superpage(entry, level) || (level == 3) )
      {
-        p2m->stats.mappings[level]--;
-        return;
-    }
+#ifdef CONFIG_IOREQ_SERVER
+        if ( domain_has_ioreq_server(p2m->domain) &&
+             (p2m->domain == current->domain) &&
+             !mfn_eq(lpae_get_mfn(new_entry), lpae_get_mfn(entry)) &&
+             p2m_is_ram(entry.p2m.type) )
+            p2m->domain->qemu_mapcache_invalidate = true;
+#endif

-    if ( level == 3 )
-    {
          p2m->stats.mappings[level]--;
-        p2m_put_l3_page(entry);
+        if ( level == 3 )
+            p2m_put_l3_page(entry);
          return;
      }

(END)
Julien Grall Nov. 11, 2020, 7:27 p.m. UTC | #5
Hi Oleksandr,

On 11/11/2020 00:03, Oleksandr wrote:
> 
> On 16.10.20 11:41, Julien Grall wrote:
>> On 16/10/2020 07:29, Jan Beulich wrote:
>>> On 15.10.2020 18:44, Oleksandr Tyshchenko wrote:
>>>> @@ -1067,7 +1068,14 @@ static int __p2m_set_entry(struct p2m_domain 
>>>> *p2m,
>>>>        */
>>>>       if ( p2m_is_valid(orig_pte) &&
>>>>            !mfn_eq(lpae_get_mfn(*entry), lpae_get_mfn(orig_pte)) )
>>>> +    {
>>>> +#ifdef CONFIG_IOREQ_SERVER
>>>> +        if ( domain_has_ioreq_server(p2m->domain) &&
>>>> +             (p2m->domain == current->domain) && 
>>>> p2m_is_ram(orig_pte.p2m.type) )
>>>> +            p2m->domain->qemu_mapcache_invalidate = true;
>>>> +#endif
>>>>           p2m_free_entry(p2m, orig_pte, level);
>>>> +    }
>>>
>>> For all I have to say here, please bear in mind that I don't know
>>> the internals of Arm memory management.
>>>
>>> The first odd thing here the merely MFN-based condition. It may
>>> well be that's sufficient, if there's no way to get a "not present"
>>> entry with an MFN matching any valid MFN. (This isn't just with
>>> your addition, but even before.
>> Invalid entries are always zeroed. So in theory the problem could 
>> arise if MFN 0 used in the guest. It should not be possible on 
>> staging, but I agree this should be fixed.
>>
>>>
>>> Given how p2m_free_entry() works (or is supposed to work in the
>>> long run), is the new code you add guaranteed to only alter leaf
>>> entries?
>>
>> This path may also be called with tables. I think we want to move the 
>> check in p2m_free_entry() so we can find the correct leaf type.
> 
> Well, but inside p2m_free_entry() we don't have a new entry in order to 
> check whether new MFN is actually different. An extra arg only comes to 
> mind...
Aside the recursive call, there are two users for p2m_free_entry():
   - When we fail to shatter a superpage in OOM
   - When the entry is replaced by an entry with a different base

I wouldn't be too concerned to send spurious mapcache invalidation in an 
error path. So I don't think you need to know the new entry.

What you need to know if the type of the leaf.

Cheers,
Oleksandr Tyshchenko Nov. 11, 2020, 7:42 p.m. UTC | #6
On 11.11.20 21:27, Julien Grall wrote:
> Hi Oleksandr,

Hi Julien.


>
> On 11/11/2020 00:03, Oleksandr wrote:
>>
>> On 16.10.20 11:41, Julien Grall wrote:
>>> On 16/10/2020 07:29, Jan Beulich wrote:
>>>> On 15.10.2020 18:44, Oleksandr Tyshchenko wrote:
>>>>> @@ -1067,7 +1068,14 @@ static int __p2m_set_entry(struct 
>>>>> p2m_domain *p2m,
>>>>>        */
>>>>>       if ( p2m_is_valid(orig_pte) &&
>>>>>            !mfn_eq(lpae_get_mfn(*entry), lpae_get_mfn(orig_pte)) )
>>>>> +    {
>>>>> +#ifdef CONFIG_IOREQ_SERVER
>>>>> +        if ( domain_has_ioreq_server(p2m->domain) &&
>>>>> +             (p2m->domain == current->domain) && 
>>>>> p2m_is_ram(orig_pte.p2m.type) )
>>>>> +            p2m->domain->qemu_mapcache_invalidate = true;
>>>>> +#endif
>>>>>           p2m_free_entry(p2m, orig_pte, level);
>>>>> +    }
>>>>
>>>> For all I have to say here, please bear in mind that I don't know
>>>> the internals of Arm memory management.
>>>>
>>>> The first odd thing here the merely MFN-based condition. It may
>>>> well be that's sufficient, if there's no way to get a "not present"
>>>> entry with an MFN matching any valid MFN. (This isn't just with
>>>> your addition, but even before.
>>> Invalid entries are always zeroed. So in theory the problem could 
>>> arise if MFN 0 used in the guest. It should not be possible on 
>>> staging, but I agree this should be fixed.
>>>
>>>>
>>>> Given how p2m_free_entry() works (or is supposed to work in the
>>>> long run), is the new code you add guaranteed to only alter leaf
>>>> entries?
>>>
>>> This path may also be called with tables. I think we want to move 
>>> the check in p2m_free_entry() so we can find the correct leaf type.
>>
>> Well, but inside p2m_free_entry() we don't have a new entry in order 
>> to check whether new MFN is actually different. An extra arg only 
>> comes to mind...
> Aside the recursive call, there are two users for p2m_free_entry():
>   - When we fail to shatter a superpage in OOM
>   - When the entry is replaced by an entry with a different base
>
> I wouldn't be too concerned to send spurious mapcache invalidation in 
> an error path. So I don't think you need to know the new entry.

Thank you for the clarification, sounds reasonable.


>
> What you need to know if the type of the leaf.

Yes, to check whether it is a RAM page.
diff mbox series

Patch

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 370173c..2693b0c 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1,6 +1,7 @@ 
 #include <xen/cpu.h>
 #include <xen/domain_page.h>
 #include <xen/iocap.h>
+#include <xen/ioreq.h>
 #include <xen/lib.h>
 #include <xen/sched.h>
 #include <xen/softirq.h>
@@ -1067,7 +1068,14 @@  static int __p2m_set_entry(struct p2m_domain *p2m,
      */
     if ( p2m_is_valid(orig_pte) &&
          !mfn_eq(lpae_get_mfn(*entry), lpae_get_mfn(orig_pte)) )
+    {
+#ifdef CONFIG_IOREQ_SERVER
+        if ( domain_has_ioreq_server(p2m->domain) &&
+             (p2m->domain == current->domain) && p2m_is_ram(orig_pte.p2m.type) )
+            p2m->domain->qemu_mapcache_invalidate = true;
+#endif
         p2m_free_entry(p2m, orig_pte, level);
+    }
 
 out:
     unmap_domain_page(table);
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index a8f5fdf..9eaa342 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1442,6 +1442,7 @@  static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr,
                               const union hsr hsr)
 {
     arm_hypercall_fn_t call = NULL;
+    struct vcpu *v = current;
 
     BUILD_BUG_ON(NR_hypercalls < ARRAY_SIZE(arm_hypercall_table) );
 
@@ -1458,7 +1459,7 @@  static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr,
         return;
     }
 
-    current->hcall_preempted = false;
+    v->hcall_preempted = false;
 
     perfc_incra(hypercalls, *nr);
     call = arm_hypercall_table[*nr].fn;
@@ -1471,7 +1472,7 @@  static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr,
     HYPERCALL_RESULT_REG(regs) = call(HYPERCALL_ARGS(regs));
 
 #ifndef NDEBUG
-    if ( !current->hcall_preempted )
+    if ( !v->hcall_preempted )
     {
         /* Deliberately corrupt parameter regs used by this hypercall. */
         switch ( arm_hypercall_table[*nr].nr_args ) {
@@ -1488,8 +1489,14 @@  static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr,
 #endif
 
     /* Ensure the hypercall trap instruction is re-executed. */
-    if ( current->hcall_preempted )
+    if ( v->hcall_preempted )
         regs->pc -= 4;  /* re-execute 'hvc #XEN_HYPERCALL_TAG' */
+
+#ifdef CONFIG_IOREQ_SERVER
+    if ( unlikely(v->domain->qemu_mapcache_invalidate) &&
+         test_and_clear_bool(v->domain->qemu_mapcache_invalidate) )
+        send_invalidate_ioreq();
+#endif
 }
 
 void arch_hypercall_tasklet_result(struct vcpu *v, long res)