[V2,21/23] xen/arm: Add mapcache invalidation handling

Message ID	1602780274-29141-22-git-send-email-olekstysh@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=IsAa=DW=lists.xenproject.org=xen-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 02557206CA Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org> From: Oleksandr Tyshchenko <olekstysh@gmail.com> To: xen-devel@lists.xenproject.org Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>, Stefano Stabellini <sstabellini@kernel.org>, Julien Grall <julien@xen.org>, Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>, Julien Grall <julien.grall@arm.com> Subject: [PATCH V2 21/23] xen/arm: Add mapcache invalidation handling Date: Thu, 15 Oct 2020 19:44:32 +0300 Message-Id: <1602780274-29141-22-git-send-email-olekstysh@gmail.com> In-Reply-To: <1602780274-29141-1-git-send-email-olekstysh@gmail.com> References: <1602780274-29141-1-git-send-email-olekstysh@gmail.com>
Series	IOREQ feature (+ virtio-mmio) on Arm \| expand [V2,00/23] IOREQ feature (+ virtio-mmio) on Arm [V2,01/23] x86/ioreq: Prepare IOREQ feature for making it common [V2,02/23] xen/ioreq: Make x86's IOREQ feature common [V2,03/23] xen/ioreq: Make x86's hvm_ioreq_needs_completion() common [V2,04/23] xen/ioreq: Provide alias for the handle_mmio() [V2,05/23] xen/ioreq: Make x86's hvm_mmio_first(last)_byte() common [V2,06/23] xen/ioreq: Make x86's hvm_ioreq_(page/vcpu/server) structs common [V2,07/23] xen/ioreq: Move x86's ioreq_gfn(server) to struct domain [V2,08/23] xen/ioreq: Introduce ioreq_params to abstract accesses to arch.hvm.params [V2,09/23] xen/dm: Make x86's DM feature common [V2,10/23] xen/mm: Make x86's XENMEM_resource_ioreq_server handling common [V2,11/23] xen/ioreq: Move x86's io_completion/io_req fields to struct vcpu [V2,12/23] xen/ioreq: Remove "hvm" prefixes from involved function names [V2,13/23] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg() [V2,14/23] arm/ioreq: Introduce arch specific bits for IOREQ/DM features [V2,15/23] xen/arm: Stick around in leave_hypervisor_to_guest until I/O has completed [V2,16/23] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm [V2,17/23] xen/ioreq: Introduce domain_has_ioreq_server() [V2,18/23] xen/dm: Introduce xendevicemodel_set_irq_level DM op [V2,19/23] xen/arm: io: Abstract sign-extension [V2,20/23] xen/ioreq: Make x86's send_invalidate_req() common [V2,21/23] xen/arm: Add mapcache invalidation handling [V2,22/23] libxl: Introduce basic virtio-mmio support on Arm [V2,23/23,RFC] libxl: Add support for virtio-disk configuration

Oleksandr Tyshchenko Oct. 15, 2020, 4:44 p.m. UTC

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

We need to send mapcache invalidation request to qemu/demu everytime
the page gets removed from a guest.

At the moment, the Arm code doesn't explicitely remove the existing
mapping before inserting the new mapping. Instead, this is done
implicitely by __p2m_set_entry().

So the corresponding flag will be set in __p2m_set_entry() if old entry
is a RAM page *and* the new MFN is different. And the invalidation
request will be sent in do_trap_hypercall() later on.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
CC: Julien Grall <julien.grall@arm.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes V1 -> V2:
   - new patch, some changes were derived from (+ new explanation):
     xen/ioreq: Make x86's invalidate qemu mapcache handling common
   - put setting of the flag into __p2m_set_entry()
   - clarify the conditions when the flag should be set
   - use domain_has_ioreq_server()
   - update do_trap_hypercall() by adding local variable
---
 xen/arch/arm/p2m.c   |  8 ++++++++
 xen/arch/arm/traps.c | 13 ++++++++++---
 2 files changed, 18 insertions(+), 3 deletions(-)

Jan Beulich Oct. 16, 2020, 6:29 a.m. UTC | #1

On 15.10.2020 18:44, Oleksandr Tyshchenko wrote:
> @@ -1067,7 +1068,14 @@ static int __p2m_set_entry(struct p2m_domain *p2m,
>       */
>      if ( p2m_is_valid(orig_pte) &&
>           !mfn_eq(lpae_get_mfn(*entry), lpae_get_mfn(orig_pte)) )
> +    {
> +#ifdef CONFIG_IOREQ_SERVER
> +        if ( domain_has_ioreq_server(p2m->domain) &&
> +             (p2m->domain == current->domain) && p2m_is_ram(orig_pte.p2m.type) )
> +            p2m->domain->qemu_mapcache_invalidate = true;
> +#endif
>          p2m_free_entry(p2m, orig_pte, level);
> +    }

For all I have to say here, please bear in mind that I don't know
the internals of Arm memory management.

The first odd thing here the merely MFN-based condition. It may
well be that's sufficient, if there's no way to get a "not present"
entry with an MFN matching any valid MFN. (This isn't just with
your addition, but even before.)

Given how p2m_free_entry() works (or is supposed to work in the
long run), is the new code you add guaranteed to only alter leaf
entries? If not, the freeing of page tables needs deferring until
after qemu has dropped its mappings.

And with there being refcounting only for foreign pages, how do
you prevent the freeing of the page just unmapped before qemu has
dropped its possible mapping? On the x86 side this problem is one
of the reasons why PVH Dom0 isn't "supported", yet. At least a
respective code comment would seem advisable, so the issue to be
addressed won't be forgotten.

> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -1442,6 +1442,7 @@ static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr,
>                                const union hsr hsr)
>  {
>      arm_hypercall_fn_t call = NULL;
> +    struct vcpu *v = current;

This ought to be named "curr".

Jan

Julien Grall Oct. 16, 2020, 8:41 a.m. UTC | #2

Hi Jan,

On 16/10/2020 07:29, Jan Beulich wrote:
> On 15.10.2020 18:44, Oleksandr Tyshchenko wrote:
>> @@ -1067,7 +1068,14 @@ static int __p2m_set_entry(struct p2m_domain *p2m,
>>        */
>>       if ( p2m_is_valid(orig_pte) &&
>>            !mfn_eq(lpae_get_mfn(*entry), lpae_get_mfn(orig_pte)) )
>> +    {
>> +#ifdef CONFIG_IOREQ_SERVER
>> +        if ( domain_has_ioreq_server(p2m->domain) &&
>> +             (p2m->domain == current->domain) && p2m_is_ram(orig_pte.p2m.type) )
>> +            p2m->domain->qemu_mapcache_invalidate = true;
>> +#endif
>>           p2m_free_entry(p2m, orig_pte, level);
>> +    }
> 
> For all I have to say here, please bear in mind that I don't know
> the internals of Arm memory management.
> 
> The first odd thing here the merely MFN-based condition. It may
> well be that's sufficient, if there's no way to get a "not present"
> entry with an MFN matching any valid MFN. (This isn't just with
> your addition, but even before.
Invalid entries are always zeroed. So in theory the problem could arise 
if MFN 0 used in the guest. It should not be possible on staging, but I 
agree this should be fixed.

> 
> Given how p2m_free_entry() works (or is supposed to work in the
> long run), is the new code you add guaranteed to only alter leaf
> entries?

This path may also be called with tables. I think we want to move the 
check in p2m_free_entry() so we can find the correct leaf type.

> If not, the freeing of page tables needs deferring until
> after qemu has dropped its mappings.

Freeing the page tables doesn't release a page. So may I ask why we 
would need to defer it?

> 
> And with there being refcounting only for foreign pages, how do
> you prevent the freeing of the page just unmapped before qemu has
> dropped its possible mapping?
QEMU mappings can only be done using the foreign mapping interface. This 
means that page reference count will be incremented for each QEMU 
mappings. Therefore the page cannot disappear until QEMU dropped the 
last reference.

> On the x86 side this problem is one
> of the reasons why PVH Dom0 isn't "supported", yet. At least a
> respective code comment would seem advisable, so the issue to be
> addressed won't be forgotten.

Are you sure? Isn't because you don't take a reference on foreign pages 
while mapping it?

Anyway, Arm has supported foreign mapping since its inception. So if 
there is a bug, then it should be fixed.

Cheers,

Jan Beulich Oct. 16, 2020, 8:56 a.m. UTC | #3

On 16.10.2020 10:41, Julien Grall wrote:
> On 16/10/2020 07:29, Jan Beulich wrote:
>> Given how p2m_free_entry() works (or is supposed to work in the
>> long run), is the new code you add guaranteed to only alter leaf
>> entries?
> 
> This path may also be called with tables. I think we want to move the 
> check in p2m_free_entry() so we can find the correct leaf type.
> 
>> If not, the freeing of page tables needs deferring until
>> after qemu has dropped its mappings.
> 
> Freeing the page tables doesn't release a page. So may I ask why we 
> would need to defer it?

Oh, sorry - qemu of course doesn't use the same p2m, so the
intermediate page tables are private to the subject guest.

>> And with there being refcounting only for foreign pages, how do
>> you prevent the freeing of the page just unmapped before qemu has
>> dropped its possible mapping?
> QEMU mappings can only be done using the foreign mapping interface. This 
> means that page reference count will be incremented for each QEMU 
> mappings. Therefore the page cannot disappear until QEMU dropped the 
> last reference.

Okay, sorry for the noise then.

Jan

Oleksandr Tyshchenko Nov. 11, 2020, 12:03 a.m. UTC | #4

On 16.10.20 11:41, Julien Grall wrote:

Hi Jan, Julien

Sorry for the late response.

> Hi Jan,
>
> On 16/10/2020 07:29, Jan Beulich wrote:
>> On 15.10.2020 18:44, Oleksandr Tyshchenko wrote:
>>> @@ -1067,7 +1068,14 @@ static int __p2m_set_entry(struct p2m_domain 
>>> *p2m,
>>>        */
>>>       if ( p2m_is_valid(orig_pte) &&
>>>            !mfn_eq(lpae_get_mfn(*entry), lpae_get_mfn(orig_pte)) )
>>> +    {
>>> +#ifdef CONFIG_IOREQ_SERVER
>>> +        if ( domain_has_ioreq_server(p2m->domain) &&
>>> +             (p2m->domain == current->domain) && 
>>> p2m_is_ram(orig_pte.p2m.type) )
>>> +            p2m->domain->qemu_mapcache_invalidate = true;
>>> +#endif
>>>           p2m_free_entry(p2m, orig_pte, level);
>>> +    }
>>
>> For all I have to say here, please bear in mind that I don't know
>> the internals of Arm memory management.
>>
>> The first odd thing here the merely MFN-based condition. It may
>> well be that's sufficient, if there's no way to get a "not present"
>> entry with an MFN matching any valid MFN. (This isn't just with
>> your addition, but even before.
> Invalid entries are always zeroed. So in theory the problem could 
> arise if MFN 0 used in the guest. It should not be possible on 
> staging, but I agree this should be fixed.
>
>>
>> Given how p2m_free_entry() works (or is supposed to work in the
>> long run), is the new code you add guaranteed to only alter leaf
>> entries?
>
> This path may also be called with tables. I think we want to move the 
> check in p2m_free_entry() so we can find the correct leaf type.

Well, but inside p2m_free_entry() we don't have a new entry in order to 
check whether new MFN is actually different. An extra arg only comes to 
mind...

[Didn't update call sites yet and didn't tested]

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 5bb23df..4001f46 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -739,7 +739,7 @@ static void p2m_put_l3_page(const lpae_t pte)

  /* Free lpae sub-tree behind an entry */
  static void p2m_free_entry(struct p2m_domain *p2m,
-                           lpae_t entry, unsigned int level)
+                           lpae_t entry, lpae_t new_entry, unsigned int 
level)
  {
      unsigned int i;
      lpae_t *table;
@@ -750,17 +750,19 @@ static void p2m_free_entry(struct p2m_domain *p2m,
      if ( !p2m_is_valid(entry) )
          return;

-    /* Nothing to do but updating the stats if the entry is a 
super-page. */
-    if ( p2m_is_superpage(entry, level) )
+    if ( p2m_is_superpage(entry, level) || (level == 3) )
      {
-        p2m->stats.mappings[level]--;
-        return;
-    }
+#ifdef CONFIG_IOREQ_SERVER
+        if ( domain_has_ioreq_server(p2m->domain) &&
+             (p2m->domain == current->domain) &&
+             !mfn_eq(lpae_get_mfn(new_entry), lpae_get_mfn(entry)) &&
+             p2m_is_ram(entry.p2m.type) )
+            p2m->domain->qemu_mapcache_invalidate = true;
+#endif

-    if ( level == 3 )
-    {
          p2m->stats.mappings[level]--;
-        p2m_put_l3_page(entry);
+        if ( level == 3 )
+            p2m_put_l3_page(entry);
          return;
      }

(END)

Julien Grall Nov. 11, 2020, 7:27 p.m. UTC | #5

Hi Oleksandr,

On 11/11/2020 00:03, Oleksandr wrote:
> 
> On 16.10.20 11:41, Julien Grall wrote:
>> On 16/10/2020 07:29, Jan Beulich wrote:
>>> On 15.10.2020 18:44, Oleksandr Tyshchenko wrote:
>>>> @@ -1067,7 +1068,14 @@ static int __p2m_set_entry(struct p2m_domain 
>>>> *p2m,
>>>>        */
>>>>       if ( p2m_is_valid(orig_pte) &&
>>>>            !mfn_eq(lpae_get_mfn(*entry), lpae_get_mfn(orig_pte)) )
>>>> +    {
>>>> +#ifdef CONFIG_IOREQ_SERVER
>>>> +        if ( domain_has_ioreq_server(p2m->domain) &&
>>>> +             (p2m->domain == current->domain) && 
>>>> p2m_is_ram(orig_pte.p2m.type) )
>>>> +            p2m->domain->qemu_mapcache_invalidate = true;
>>>> +#endif
>>>>           p2m_free_entry(p2m, orig_pte, level);
>>>> +    }
>>>
>>> For all I have to say here, please bear in mind that I don't know
>>> the internals of Arm memory management.
>>>
>>> The first odd thing here the merely MFN-based condition. It may
>>> well be that's sufficient, if there's no way to get a "not present"
>>> entry with an MFN matching any valid MFN. (This isn't just with
>>> your addition, but even before.
>> Invalid entries are always zeroed. So in theory the problem could 
>> arise if MFN 0 used in the guest. It should not be possible on 
>> staging, but I agree this should be fixed.
>>
>>>
>>> Given how p2m_free_entry() works (or is supposed to work in the
>>> long run), is the new code you add guaranteed to only alter leaf
>>> entries?
>>
>> This path may also be called with tables. I think we want to move the 
>> check in p2m_free_entry() so we can find the correct leaf type.
> 
> Well, but inside p2m_free_entry() we don't have a new entry in order to 
> check whether new MFN is actually different. An extra arg only comes to 
> mind...
Aside the recursive call, there are two users for p2m_free_entry():
   - When we fail to shatter a superpage in OOM
   - When the entry is replaced by an entry with a different base

I wouldn't be too concerned to send spurious mapcache invalidation in an 
error path. So I don't think you need to know the new entry.

What you need to know if the type of the leaf.

Cheers,

Oleksandr Tyshchenko Nov. 11, 2020, 7:42 p.m. UTC | #6

On 11.11.20 21:27, Julien Grall wrote:
> Hi Oleksandr,

Hi Julien.


>
> On 11/11/2020 00:03, Oleksandr wrote:
>>
>> On 16.10.20 11:41, Julien Grall wrote:
>>> On 16/10/2020 07:29, Jan Beulich wrote:
>>>> On 15.10.2020 18:44, Oleksandr Tyshchenko wrote:
>>>>> @@ -1067,7 +1068,14 @@ static int __p2m_set_entry(struct 
>>>>> p2m_domain *p2m,
>>>>>        */
>>>>>       if ( p2m_is_valid(orig_pte) &&
>>>>>            !mfn_eq(lpae_get_mfn(*entry), lpae_get_mfn(orig_pte)) )
>>>>> +    {
>>>>> +#ifdef CONFIG_IOREQ_SERVER
>>>>> +        if ( domain_has_ioreq_server(p2m->domain) &&
>>>>> +             (p2m->domain == current->domain) && 
>>>>> p2m_is_ram(orig_pte.p2m.type) )
>>>>> +            p2m->domain->qemu_mapcache_invalidate = true;
>>>>> +#endif
>>>>>           p2m_free_entry(p2m, orig_pte, level);
>>>>> +    }
>>>>
>>>> For all I have to say here, please bear in mind that I don't know
>>>> the internals of Arm memory management.
>>>>
>>>> The first odd thing here the merely MFN-based condition. It may
>>>> well be that's sufficient, if there's no way to get a "not present"
>>>> entry with an MFN matching any valid MFN. (This isn't just with
>>>> your addition, but even before.
>>> Invalid entries are always zeroed. So in theory the problem could 
>>> arise if MFN 0 used in the guest. It should not be possible on 
>>> staging, but I agree this should be fixed.
>>>
>>>>
>>>> Given how p2m_free_entry() works (or is supposed to work in the
>>>> long run), is the new code you add guaranteed to only alter leaf
>>>> entries?
>>>
>>> This path may also be called with tables. I think we want to move 
>>> the check in p2m_free_entry() so we can find the correct leaf type.
>>
>> Well, but inside p2m_free_entry() we don't have a new entry in order 
>> to check whether new MFN is actually different. An extra arg only 
>> comes to mind...
> Aside the recursive call, there are two users for p2m_free_entry():
>   - When we fail to shatter a superpage in OOM
>   - When the entry is replaced by an entry with a different base
>
> I wouldn't be too concerned to send spurious mapcache invalidation in 
> an error path. So I don't think you need to know the new entry.

Thank you for the clarification, sounds reasonable.


>
> What you need to know if the type of the leaf.

Yes, to check whether it is a RAM page.

[V2,21/23] xen/arm: Add mapcache invalidation handling

Commit Message

Comments

Patch