diff mbox series

[v2] xen/arm: p2m: Populate pages for GICv2 mapping in arch_domain_create()

Message ID 20221014080917.14980-1-Henry.Wang@arm.com (mailing list archive)
State Superseded
Headers show
Series [v2] xen/arm: p2m: Populate pages for GICv2 mapping in arch_domain_create() | expand

Commit Message

Henry Wang Oct. 14, 2022, 8:09 a.m. UTC
Hardware using GICv2 needs to create a P2M mapping of 8KB GICv2 area
when the domain is created. Considering the worst case of page tables
which requires 6 P2M pages as the two pages will be consecutive but not
necessarily in the same L3 page table and keep a buffer, populate 16
pages as the default value to the P2M pages pool in arch_domain_create()
at the domain creation stage to satisfy the GICv2 requirement. For
GICv3, the above-mentioned P2M mapping is not necessary, but since the
allocated 16 pages here would not be lost, hence populate these pages
unconditionally.

With the default 16 P2M pages populated, there would be a case that
failures would happen in the domain creation with P2M pages already in
use. To properly free the P2M for this case, firstly support the
optionally preemption of p2m_teardown(), then call p2m_teardown() and
p2m_set_allocation(d, 0, NULL) in p2m_final_teardown() if needed.

Fixes: cbea5a1149ca ("xen/arm: Allocate and free P2M pages from the P2M pool")
Suggested-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
---
This should also be backported to 4.13, 4.14, 4.15 and 4.16.
v2 changes:
- Move the p2m_set_allocation(d, 0, NULL); to p2m_final_teardown().
- Support optionally preemption of p2m_teardown(), and make the calling of
  p2m_teardown() preemptively when relinquish the resources, non-preemptively
  in p2m_final_teardown().
- Refactor the error handling to make the code use less spin_unlock.
- Explain the worst case of page tables and the unconditional population
  of pages in commit message.
- Mention the unconditional population of pages in in-code comment.
---
 xen/arch/arm/domain.c          | 16 +++++++++++++++-
 xen/arch/arm/include/asm/p2m.h | 11 +++++++----
 xen/arch/arm/p2m.c             | 15 +++++++++++++--
 3 files changed, 35 insertions(+), 7 deletions(-)

Comments

Jan Beulich Oct. 14, 2022, 9:21 a.m. UTC | #1
On 14.10.2022 10:09, Henry Wang wrote:
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -740,6 +740,20 @@ int arch_domain_create(struct domain *d,
>          BUG();
>      }
>  
> +    /*
> +     * Hardware using GICv2 needs to create a P2M mapping of 8KB GICv2 area
> +     * when the domain is created. Considering the worst case for page
> +     * tables and keep a buffer, populate 16 pages to the P2M pages pool here.
> +     * For GICv3, the above-mentioned P2M mapping is not necessary, but since
> +     * the allocated 16 pages here would not be lost, hence populate these
> +     * pages unconditionally.
> +     */
> +    spin_lock(&d->arch.paging.lock);
> +    rc = p2m_set_allocation(d, 16, NULL);
> +    spin_unlock(&d->arch.paging.lock);
> +    if ( rc != 0 )
> +        goto fail;

Putting this level of knowledge here feels like a layering violation to
me. My first suggestion would be to move this call somewhere under
p2m_init(). If that's not possible for some reason, I'd like to suggest
passing 1 here as the count and then adding a min-acceptable check to
p2m_set_allocation() along the lines of x86'es shadow_set_allocation().
That way you'd also guarantee the minimum number of pages in case a
subsequent tiny allocation request came in via domctl.

> @@ -1736,6 +1736,17 @@ void p2m_final_teardown(struct domain *d)
>      if ( !p2m->domain )
>          return;
>  
> +    if ( !page_list_empty(&p2m->pages) )
> +        p2m_teardown(d, false);
> +
> +    if ( d->arch.paging.p2m_total_pages != 0 )
> +    {
> +        spin_lock(&d->arch.paging.lock);
> +        p2m_set_allocation(d, 0, NULL);
> +        spin_unlock(&d->arch.paging.lock);
> +        ASSERT(d->arch.paging.p2m_total_pages == 0);
> +    }

Is it intentional to largely open-code p2m_teardown_allocation() here?

Jan
Henry Wang Oct. 14, 2022, 9:28 a.m. UTC | #2
Hi Jan,

Thanks for the review.

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Subject: Re: [PATCH v2] xen/arm: p2m: Populate pages for GICv2 mapping in
> arch_domain_create()
> 
> On 14.10.2022 10:09, Henry Wang wrote:
> > --- a/xen/arch/arm/domain.c
> > +++ b/xen/arch/arm/domain.c
> > @@ -740,6 +740,20 @@ int arch_domain_create(struct domain *d,
> >          BUG();
> >      }
> >
> > +    /*
> > +     * Hardware using GICv2 needs to create a P2M mapping of 8KB GICv2
> area
> > +     * when the domain is created. Considering the worst case for page
> > +     * tables and keep a buffer, populate 16 pages to the P2M pages pool
> here.
> > +     * For GICv3, the above-mentioned P2M mapping is not necessary, but
> since
> > +     * the allocated 16 pages here would not be lost, hence populate these
> > +     * pages unconditionally.
> > +     */
> > +    spin_lock(&d->arch.paging.lock);
> > +    rc = p2m_set_allocation(d, 16, NULL);
> > +    spin_unlock(&d->arch.paging.lock);
> > +    if ( rc != 0 )
> > +        goto fail;
> 
> Putting this level of knowledge here feels like a layering violation to
> me. My first suggestion would be to move this call somewhere under
> p2m_init().

That is definitely possible. If Julien or other Arm maintainers are not
against that I am happy to move this to p2m_init() in v3.

The reason why the above block is placed here is just I thought to use
d->arch.vgic.version to only populate the 16 pages for GICv2 in the
beginning, and d->arch.vgic.version is first assigned later after p2m_init(),
but later we decided to populated the pages unconditionally so actually
now we can move the part to p2m_init().

> If that's not possible for some reason, I'd like to suggest
> passing 1 here as the count and then adding a min-acceptable check to
> p2m_set_allocation() along the lines of x86'es shadow_set_allocation().
> That way you'd also guarantee the minimum number of pages in case a
> subsequent tiny allocation request came in via domctl.
> 
> > @@ -1736,6 +1736,17 @@ void p2m_final_teardown(struct domain *d)
> >      if ( !p2m->domain )
> >          return;
> >
> > +    if ( !page_list_empty(&p2m->pages) )
> > +        p2m_teardown(d, false);
> > +
> > +    if ( d->arch.paging.p2m_total_pages != 0 )
> > +    {
> > +        spin_lock(&d->arch.paging.lock);
> > +        p2m_set_allocation(d, 0, NULL);
> > +        spin_unlock(&d->arch.paging.lock);
> > +        ASSERT(d->arch.paging.p2m_total_pages == 0);
> > +    }
> 
> Is it intentional to largely open-code p2m_teardown_allocation() here?

Yes, AFAICT p2m_teardown_allocation() is preemptible and we don't want
any preemption here.

Kind regards,
Henry

> 
> Jan
Jan Beulich Oct. 14, 2022, 9:57 a.m. UTC | #3
On 14.10.2022 11:28, Henry Wang wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>>
>> On 14.10.2022 10:09, Henry Wang wrote:
>>> @@ -1736,6 +1736,17 @@ void p2m_final_teardown(struct domain *d)
>>>      if ( !p2m->domain )
>>>          return;
>>>
>>> +    if ( !page_list_empty(&p2m->pages) )
>>> +        p2m_teardown(d, false);
>>> +
>>> +    if ( d->arch.paging.p2m_total_pages != 0 )
>>> +    {
>>> +        spin_lock(&d->arch.paging.lock);
>>> +        p2m_set_allocation(d, 0, NULL);
>>> +        spin_unlock(&d->arch.paging.lock);
>>> +        ASSERT(d->arch.paging.p2m_total_pages == 0);
>>> +    }
>>
>> Is it intentional to largely open-code p2m_teardown_allocation() here?
> 
> Yes, AFAICT p2m_teardown_allocation() is preemptible and we don't want
> any preemption here.

Well, this can be dealt with by adding a parameter to the function, or
by looping over it until it returns other than -ERESTART. Both would
seem better to me than this duplication of functionality (but I'm not
a maintainer of this code, as you know).

Jan
Julien Grall Oct. 14, 2022, 10:13 a.m. UTC | #4
On 14/10/2022 10:28, Henry Wang wrote:
> Hi Jan,
> 
> Thanks for the review.
> 
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Subject: Re: [PATCH v2] xen/arm: p2m: Populate pages for GICv2 mapping in
>> arch_domain_create()
>>
>> On 14.10.2022 10:09, Henry Wang wrote:
>>> --- a/xen/arch/arm/domain.c
>>> +++ b/xen/arch/arm/domain.c
>>> @@ -740,6 +740,20 @@ int arch_domain_create(struct domain *d,
>>>           BUG();
>>>       }
>>>
>>> +    /*
>>> +     * Hardware using GICv2 needs to create a P2M mapping of 8KB GICv2
>> area
>>> +     * when the domain is created. Considering the worst case for page
>>> +     * tables and keep a buffer, populate 16 pages to the P2M pages pool
>> here.
>>> +     * For GICv3, the above-mentioned P2M mapping is not necessary, but
>> since
>>> +     * the allocated 16 pages here would not be lost, hence populate these
>>> +     * pages unconditionally.
>>> +     */
>>> +    spin_lock(&d->arch.paging.lock);
>>> +    rc = p2m_set_allocation(d, 16, NULL);
>>> +    spin_unlock(&d->arch.paging.lock);
>>> +    if ( rc != 0 )
>>> +        goto fail;
>>
>> Putting this level of knowledge here feels like a layering violation to
>> me. My first suggestion would be to move this call somewhere under
>> p2m_init().
> 
> That is definitely possible. If Julien or other Arm maintainers are not
> against that I am happy to move this to p2m_init() in v3.
I understand both of Jan and your concern. I don't really have a strong 
opinion either way.

You are the author of the patch, so I will let you chose.

> 
> The reason why the above block is placed here is just I thought to use
> d->arch.vgic.version to only populate the 16 pages for GICv2 in the
> beginning, and d->arch.vgic.version is first assigned later after p2m_init(),
> but later we decided to populated the pages unconditionally so actually
> now we can move the part to p2m_init().
> 
>> If that's not possible for some reason, I'd like to suggest
>> passing 1 here as the count and then adding a min-acceptable check to
>> p2m_set_allocation() along the lines of x86'es shadow_set_allocation().
>> That way you'd also guarantee the minimum number of pages in case a
>> subsequent tiny allocation request came in via domctl.

I really dislike this. If the user ask for 1 pages and we only allow 16. 
Then this should be rejected (not bumped to 16).

However, the code in p2m_set_allocation() will only look at the free 
pages (like x86 does). So what you suggest would not do what you want.

>>
>>> @@ -1736,6 +1736,17 @@ void p2m_final_teardown(struct domain *d)
>>>       if ( !p2m->domain )
>>>           return;
>>>
>>> +    if ( !page_list_empty(&p2m->pages) )
>>> +        p2m_teardown(d, false);
>>> +
>>> +    if ( d->arch.paging.p2m_total_pages != 0 )
>>> +    {
>>> +        spin_lock(&d->arch.paging.lock);
>>> +        p2m_set_allocation(d, 0, NULL);
>>> +        spin_unlock(&d->arch.paging.lock);
>>> +        ASSERT(d->arch.paging.p2m_total_pages == 0);
>>> +    }
>>
>> Is it intentional to largely open-code p2m_teardown_allocation() here?
> 
> Yes, AFAICT p2m_teardown_allocation() is preemptible and we don't want
> any preemption here.

Like Jan, I would prefer if we can avoid the duplication. The loop 
suggested by Jan should work.

Cheers,
Julien Grall Oct. 14, 2022, 10:31 a.m. UTC | #5
Hi Henry,

On 14/10/2022 09:09, Henry Wang wrote:
> Hardware using GICv2 needs to create a P2M mapping of 8KB GICv2 area
> when the domain is created. Considering the worst case of page tables
> which requires 6 P2M pages as the two pages will be consecutive but not
> necessarily in the same L3 page table and keep a buffer, populate 16
> pages as the default value to the P2M pages pool in arch_domain_create()
> at the domain creation stage to satisfy the GICv2 requirement. For
> GICv3, the above-mentioned P2M mapping is not necessary, but since the
> allocated 16 pages here would not be lost, hence populate these pages
> unconditionally.
> 
> With the default 16 P2M pages populated, there would be a case that
> failures would happen in the domain creation with P2M pages already in
> use. To properly free the P2M for this case, firstly support the
> optionally preemption of p2m_teardown(), then call p2m_teardown() and
> p2m_set_allocation(d, 0, NULL) in p2m_final_teardown() if needed.
> 
> Fixes: cbea5a1149ca ("xen/arm: Allocate and free P2M pages from the P2M pool")
> Suggested-by: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Henry Wang <Henry.Wang@arm.com>
> ---
> This should also be backported to 4.13, 4.14, 4.15 and 4.16.
> v2 changes:
> - Move the p2m_set_allocation(d, 0, NULL); to p2m_final_teardown().
> - Support optionally preemption of p2m_teardown(), and make the calling of
>    p2m_teardown() preemptively when relinquish the resources, non-preemptively
>    in p2m_final_teardown().
> - Refactor the error handling to make the code use less spin_unlock.
> - Explain the worst case of page tables and the unconditional population
>    of pages in commit message.
> - Mention the unconditional population of pages in in-code comment.
> ---
>   xen/arch/arm/domain.c          | 16 +++++++++++++++-
>   xen/arch/arm/include/asm/p2m.h | 11 +++++++----
>   xen/arch/arm/p2m.c             | 15 +++++++++++++--
>   3 files changed, 35 insertions(+), 7 deletions(-)
> 
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 2c84e6dbbb..831e248ad7 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -740,6 +740,20 @@ int arch_domain_create(struct domain *d,
>           BUG();
>       }
>   
> +    /*
> +     * Hardware using GICv2 needs to create a P2M mapping of 8KB GICv2 area
> +     * when the domain is created. Considering the worst case for page
> +     * tables and keep a buffer, populate 16 pages to the P2M pages pool here.
> +     * For GICv3, the above-mentioned P2M mapping is not necessary, but since
> +     * the allocated 16 pages here would not be lost, hence populate these
> +     * pages unconditionally.
> +     */
> +    spin_lock(&d->arch.paging.lock);
> +    rc = p2m_set_allocation(d, 16, NULL);
> +    spin_unlock(&d->arch.paging.lock);
> +    if ( rc != 0 )
> +        goto fail;
> +
>       if ( (rc = domain_vgic_register(d, &count)) != 0 )
>           goto fail;
>   
> @@ -1064,7 +1078,7 @@ int domain_relinquish_resources(struct domain *d)
>               return ret;
>   
>       PROGRESS(p2m):
> -        ret = p2m_teardown(d);
> +        ret = p2m_teardown(d, true);
>           if ( ret )
>               return ret;
>   
> diff --git a/xen/arch/arm/include/asm/p2m.h b/xen/arch/arm/include/asm/p2m.h
> index 42bfd548c4..480d65e95e 100644
> --- a/xen/arch/arm/include/asm/p2m.h
> +++ b/xen/arch/arm/include/asm/p2m.h
> @@ -194,14 +194,17 @@ int p2m_init(struct domain *d);
>   
>   /*
>    * The P2M resources are freed in two parts:
> - *  - p2m_teardown() will be called when relinquish the resources. It
> - *    will free large resources (e.g. intermediate page-tables) that
> - *    requires preemption.
> + *  - p2m_teardown() will be called preemptively when relinquish the
> + *    resources, in which case it will free large resources (e.g. intermediate
> + *    page-tables) that requires preemption.
>    *  - p2m_final_teardown() will be called when domain struct is been
>    *    freed. This *cannot* be preempted and therefore one small
>    *    resources should be freed here.
> + *  Note that p2m_final_teardown() will also call p2m_teardown(), to properly
> + *  free the P2M when failures happen in the domain creation with P2M pages
> + *  already in use. In this case p2m_teardown() is called non-preemptively.
>    */
> -int p2m_teardown(struct domain *d);
> +int p2m_teardown(struct domain *d, bool allow_preemption);
>   void p2m_final_teardown(struct domain *d);
>   
>   /*
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index f17500ddf3..707bd3e2e3 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -1685,7 +1685,7 @@ static void p2m_free_vmid(struct domain *d)
>       spin_unlock(&vmid_alloc_lock);
>   }
>   
> -int p2m_teardown(struct domain *d)
> +int p2m_teardown(struct domain *d, bool allow_preemption)
>   {
I think the part to clean & invalidate the root should not be necessary 
if the domain is not scheduled. Similarly, I think we might only need to 
do once by domain (rather than for every call). So I would consider to 
move the logic outside of the function.

That's not for 4.17 thought.

>       struct p2m_domain *p2m = p2m_get_hostp2m(d);
>       unsigned long count = 0;
> @@ -1716,7 +1716,7 @@ int p2m_teardown(struct domain *d)
>           p2m_free_page(p2m->domain, pg);
>           count++;
>           /* Arbitrarily preempt every 512 iterations */
> -        if ( !(count % 512) && hypercall_preempt_check() )
> +        if ( allow_preemption && !(count % 512) && hypercall_preempt_check() )
>           {
>               rc = -ERESTART;
>               break;
> @@ -1736,6 +1736,17 @@ void p2m_final_teardown(struct domain *d)
>       if ( !p2m->domain )
>           return;
>   
> +    if ( !page_list_empty(&p2m->pages) )

Did you add this check to avoid the clean & invalidate if the list is empty?

> +        p2m_teardown(d, false);

Today, it should be fine to ignore p2m_teardown(). But I would prefer if 
we add an ASSERT()/BUG_ON() (or else) to make confirm this is the case.

This also wants to be documented on top of p2m_teardown() as it would be 
easier to know that the function should always return 0 when 
!allow_preemption is not set.

I also noticed that relinquish_p2m_mapping() is not called. This should 
be fine for us because arch_domain_create() should never create a 
mapping that requires p2m_put_l3_page() to be called.

I think it would be good to check it in __p2m_set_entry(). So we don't 
end up to add such mappings by mistake.

I would have suggested to add a comment only for version and send a 
follow-up patch. But I don't exactly know where to put it.

> +
> +    if ( d->arch.paging.p2m_total_pages != 0 )
> +    {
> +        spin_lock(&d->arch.paging.lock);
> +        p2m_set_allocation(d, 0, NULL);
> +        spin_unlock(&d->arch.paging.lock);
> +        ASSERT(d->arch.paging.p2m_total_pages == 0);
> +    }
> +
>       ASSERT(page_list_empty(&p2m->pages));

I would move this assert between the two ifs you added.

>       ASSERT(page_list_empty(&d->arch.paging.p2m_freelist));
>   

Cheers,
Henry Wang Oct. 14, 2022, 10:38 a.m. UTC | #6
Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Subject: Re: [PATCH v2] xen/arm: p2m: Populate pages for GICv2 mapping in
> arch_domain_create()
> >>> +    spin_lock(&d->arch.paging.lock);
> >>> +    rc = p2m_set_allocation(d, 16, NULL);
> >>> +    spin_unlock(&d->arch.paging.lock);
> >>> +    if ( rc != 0 )
> >>> +        goto fail;
> >>
> >> Putting this level of knowledge here feels like a layering violation to
> >> me. My first suggestion would be to move this call somewhere under
> >> p2m_init().
> >
> > That is definitely possible. If Julien or other Arm maintainers are not
> > against that I am happy to move this to p2m_init() in v3.
> I understand both of Jan and your concern. I don't really have a strong
> opinion either way.
> 
> You are the author of the patch, so I will let you chose.

Then p2m_init(), just want to make everyone happy :)))

> 
> >>> +    if ( d->arch.paging.p2m_total_pages != 0 )
> >>> +    {
> >>> +        spin_lock(&d->arch.paging.lock);
> >>> +        p2m_set_allocation(d, 0, NULL);
> >>> +        spin_unlock(&d->arch.paging.lock);
> >>> +        ASSERT(d->arch.paging.p2m_total_pages == 0);
> >>> +    }
> >>
> >> Is it intentional to largely open-code p2m_teardown_allocation() here?
> >
> > Yes, AFAICT p2m_teardown_allocation() is preemptible and we don't want
> > any preemption here.
> 
> Like Jan, I would prefer if we can avoid the duplication. The loop
> suggested by Jan should work.

I am a little bit worried about the -ENOMEM, if -ENOMEM is
returned from p2m_teardown_allocation(d), I think we are in
the infinite loop, or did I miss understood the loop that Jan referred
to?

Kind regards,
Henry

> 
> Cheers,
> 
> --
> Julien Grall
Jan Beulich Oct. 14, 2022, 10:45 a.m. UTC | #7
On 14.10.2022 12:38, Henry Wang wrote:
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>>>>> +    if ( d->arch.paging.p2m_total_pages != 0 )
>>>>> +    {
>>>>> +        spin_lock(&d->arch.paging.lock);
>>>>> +        p2m_set_allocation(d, 0, NULL);
>>>>> +        spin_unlock(&d->arch.paging.lock);
>>>>> +        ASSERT(d->arch.paging.p2m_total_pages == 0);
>>>>> +    }
>>>>
>>>> Is it intentional to largely open-code p2m_teardown_allocation() here?
>>>
>>> Yes, AFAICT p2m_teardown_allocation() is preemptible and we don't want
>>> any preemption here.
>>
>> Like Jan, I would prefer if we can avoid the duplication. The loop
>> suggested by Jan should work.
> 
> I am a little bit worried about the -ENOMEM, if -ENOMEM is
> returned from p2m_teardown_allocation(d), I think we are in
> the infinite loop, or did I miss understood the loop that Jan referred
> to?

Where would -ENOMEM come from? We're firmly freeing memory here. -ENOMEM
can only occur for a non-zero 2nd argument.

Jan
Henry Wang Oct. 14, 2022, 10:53 a.m. UTC | #8
Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Subject: Re: [PATCH v2] xen/arm: p2m: Populate pages for GICv2 mapping in
> arch_domain_create()
> 
> On 14.10.2022 12:38, Henry Wang wrote:
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >>>>> +    if ( d->arch.paging.p2m_total_pages != 0 )
> >>>>> +    {
> >>>>> +        spin_lock(&d->arch.paging.lock);
> >>>>> +        p2m_set_allocation(d, 0, NULL);
> >>>>> +        spin_unlock(&d->arch.paging.lock);
> >>>>> +        ASSERT(d->arch.paging.p2m_total_pages == 0);
> >>>>> +    }
> >>>>
> >>>> Is it intentional to largely open-code p2m_teardown_allocation() here?
> >>>
> >>> Yes, AFAICT p2m_teardown_allocation() is preemptible and we don't
> want
> >>> any preemption here.
> >>
> >> Like Jan, I would prefer if we can avoid the duplication. The loop
> >> suggested by Jan should work.
> >
> > I am a little bit worried about the -ENOMEM, if -ENOMEM is
> > returned from p2m_teardown_allocation(d), I think we are in
> > the infinite loop, or did I miss understood the loop that Jan referred
> > to?
> 
> Where would -ENOMEM come from? We're firmly freeing memory here. -
> ENOMEM
> can only occur for a non-zero 2nd argument.

My initial thought is the "else if" part in p2m_set_allocation. It might be
wrong. Would the code below seems ok to you?

int err;

do {
    err = p2m_teardown_allocation(d)
} while ( err == -ERESTART )

Kind regards,
Henry

> 
> Jan
Jan Beulich Oct. 14, 2022, 10:59 a.m. UTC | #9
On 14.10.2022 12:53, Henry Wang wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>>
>> On 14.10.2022 12:38, Henry Wang wrote:
>>>> -----Original Message-----
>>>> From: Julien Grall <julien@xen.org>
>>>>>>> +    if ( d->arch.paging.p2m_total_pages != 0 )
>>>>>>> +    {
>>>>>>> +        spin_lock(&d->arch.paging.lock);
>>>>>>> +        p2m_set_allocation(d, 0, NULL);
>>>>>>> +        spin_unlock(&d->arch.paging.lock);
>>>>>>> +        ASSERT(d->arch.paging.p2m_total_pages == 0);
>>>>>>> +    }
>>>>>>
>>>>>> Is it intentional to largely open-code p2m_teardown_allocation() here?
>>>>>
>>>>> Yes, AFAICT p2m_teardown_allocation() is preemptible and we don't
>> want
>>>>> any preemption here.
>>>>
>>>> Like Jan, I would prefer if we can avoid the duplication. The loop
>>>> suggested by Jan should work.
>>>
>>> I am a little bit worried about the -ENOMEM, if -ENOMEM is
>>> returned from p2m_teardown_allocation(d), I think we are in
>>> the infinite loop, or did I miss understood the loop that Jan referred
>>> to?
>>
>> Where would -ENOMEM come from? We're firmly freeing memory here. -
>> ENOMEM
>> can only occur for a non-zero 2nd argument.
> 
> My initial thought is the "else if" part in p2m_set_allocation. It might be
> wrong. Would the code below seems ok to you?
> 
> int err;
> 
> do {
>     err = p2m_teardown_allocation(d)
> } while ( err == -ERESTART )

Sure, one of several ways of doing it.

Jan
Henry Wang Oct. 14, 2022, 11:04 a.m. UTC | #10
Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Subject: Re: [PATCH v2] xen/arm: p2m: Populate pages for GICv2 mapping in
> arch_domain_create()
> 
> > My initial thought is the "else if" part in p2m_set_allocation. It might be
> > wrong. Would the code below seems ok to you?
> >
> > int err;
> >
> > do {
> >     err = p2m_teardown_allocation(d)
> > } while ( err == -ERESTART )
> 
> Sure, one of several ways of doing it.

Thanks for your confirmation. Just to play safe if you have more simple
Solutions please do raise it. It is a good opportunity for me to learn and
personally I am not a big fan of either do-while or the introduced "err"
which is used only by p2m_teardown_allocation(d), considering the
p2m_final_teardown(d) has a void return type...

Thanks for your patience in advance :)

Kind regards,
Henry


> 
> Jan
Henry Wang Oct. 14, 2022, 11:19 a.m. UTC | #11
Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
>
> Subject: Re: [PATCH v2] xen/arm: p2m: Populate pages for GICv2 mapping in
> arch_domain_create()
> 
> Hi Henry,
> 
> >
> > -int p2m_teardown(struct domain *d)
> > +int p2m_teardown(struct domain *d, bool allow_preemption)
> >   {
> I think the part to clean & invalidate the root should not be necessary
> if the domain is not scheduled. Similarly, I think we might only need to
> do once by domain (rather than for every call). So I would consider to
> move the logic outside of the function.
> 
> That's not for 4.17 thought.

Sure, I can prepare the follow up patch after 4.17 as (1) I am also worried
about if this patch would become bigger and bigger (2) I checked you also
want other things in your below comment.

> 
> >       struct p2m_domain *p2m = p2m_get_hostp2m(d);
> >       unsigned long count = 0;
> > @@ -1716,7 +1716,7 @@ int p2m_teardown(struct domain *d)
> >           p2m_free_page(p2m->domain, pg);
> >           count++;
> >           /* Arbitrarily preempt every 512 iterations */
> > -        if ( !(count % 512) && hypercall_preempt_check() )
> > +        if ( allow_preemption && !(count % 512) &&
> hypercall_preempt_check() )
> >           {
> >               rc = -ERESTART;
> >               break;
> > @@ -1736,6 +1736,17 @@ void p2m_final_teardown(struct domain *d)
> >       if ( !p2m->domain )
> >           return;
> >
> > +    if ( !page_list_empty(&p2m->pages) )
> 
> Did you add this check to avoid the clean & invalidate if the list is empty?

Yep. I think we only need the p2m_teardown() if we actually have something
in p2m->pages list.

> 
> > +        p2m_teardown(d, false);
> 
> Today, it should be fine to ignore p2m_teardown(). But I would prefer if
> we add an ASSERT()/BUG_ON() (or else) to make confirm this is the case.

Sorry I do not really understand why we can ignore the p2m_teardown()
probably because of my English. Let's talk a bit more in C if you don't mind :))
Do you mean p2m_teardown() should be called here unconditionally without
the if ( !page_list_empty(&p2m->pages) ) check?

> 
> This also wants to be documented on top of p2m_teardown() as it would be
> easier to know that the function should always return 0 when
> !allow_preemption is not set.

Ok, will do.

> 
> I also noticed that relinquish_p2m_mapping() is not called. This should
> be fine for us because arch_domain_create() should never create a
> mapping that requires p2m_put_l3_page() to be called.
> 
> I think it would be good to check it in __p2m_set_entry(). So we don't
> end up to add such mappings by mistake.

I thought for a while but failed to translate the above requirements
to proper if conditions in __p2m_set_entry()...

> 
> I would have suggested to add a comment only for version and send a
> follow-up patch. But I don't exactly know where to put it.

...how about p2m_final_teardown(), we can use a TODO to explain why
we don't need to call relinquish_p2m_mapping() and a following patch
can fix this?

> 
> > +
> > +    if ( d->arch.paging.p2m_total_pages != 0 )
> > +    {
> > +        spin_lock(&d->arch.paging.lock);
> > +        p2m_set_allocation(d, 0, NULL);
> > +        spin_unlock(&d->arch.paging.lock);
> > +        ASSERT(d->arch.paging.p2m_total_pages == 0);
> > +    }
> > +
> >       ASSERT(page_list_empty(&p2m->pages));
> 
> I would move this assert between the two ifs you added.

Sure, will do in v3.

Kind regards,
Henry

> 
> >       ASSERT(page_list_empty(&d->arch.paging.p2m_freelist));
> >
> 
> Cheers,
> 
> --
> Julien Grall
Jan Beulich Oct. 14, 2022, noon UTC | #12
On 14.10.2022 13:04, Henry Wang wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>>
>>> My initial thought is the "else if" part in p2m_set_allocation. It might be
>>> wrong. Would the code below seems ok to you?
>>>
>>> int err;
>>>
>>> do {
>>>     err = p2m_teardown_allocation(d)
>>> } while ( err == -ERESTART )
>>
>> Sure, one of several ways of doing it.
> 
> Thanks for your confirmation. Just to play safe if you have more simple
> Solutions please do raise it. It is a good opportunity for me to learn and
> personally I am not a big fan of either do-while or the introduced "err"
> which is used only by p2m_teardown_allocation(d), considering the
> p2m_final_teardown(d) has a void return type...

Personally I would probably have written

    while ( p2m_teardown_allocation(d) == -ERESTART )
        /* Nothing - no preemption support here. */;

or

    while ( p2m_teardown_allocation(d) == -ERESTART )
        continue; /* No preemption support here. */

. Otoh with the "err" variable you could ASSERT(!err) after the loop.

Jan
Henry Wang Oct. 14, 2022, 12:10 p.m. UTC | #13
Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Subject: Re: [PATCH v2] xen/arm: p2m: Populate pages for GICv2 mapping in
> arch_domain_create()
> 
> On 14.10.2022 13:04, Henry Wang wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >>
> >>> My initial thought is the "else if" part in p2m_set_allocation. It might be
> >>> wrong. Would the code below seems ok to you?
> >>>
> >>> int err;
> >>>
> >>> do {
> >>>     err = p2m_teardown_allocation(d)
> >>> } while ( err == -ERESTART )
> >>
> >> Sure, one of several ways of doing it.
> >
> > Thanks for your confirmation. Just to play safe if you have more simple
> > Solutions please do raise it. It is a good opportunity for me to learn and
> > personally I am not a big fan of either do-while or the introduced "err"
> > which is used only by p2m_teardown_allocation(d), considering the
> > p2m_final_teardown(d) has a void return type...
> 
> Personally I would probably have written
> 
>     while ( p2m_teardown_allocation(d) == -ERESTART )
>         /* Nothing - no preemption support here. */;

Thanks very much for the suggestions! I didn't think of the /* */
part and I really like this idea. This said, a quick search of different
coding styles and I found [1] mentioned:
"Empty loop bodies should use either empty braces or continue. "
So I will probably follow...

> 
> or
> 
>     while ( p2m_teardown_allocation(d) == -ERESTART )
>         continue; /* No preemption support here. */

...this way. Great experience of learning, thanks!

[1] https://google.github.io/styleguide/cppguide.html

Kind regards,
Henry

> 
> . Otoh with the "err" variable you could ASSERT(!err) after the loop.
> 
> Jan
Julien Grall Oct. 15, 2022, 10:58 a.m. UTC | #14
On 14/10/2022 12:19, Henry Wang wrote:
> Hi Julien,

Hi Henry,

>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>>>        struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>>        unsigned long count = 0;
>>> @@ -1716,7 +1716,7 @@ int p2m_teardown(struct domain *d)
>>>            p2m_free_page(p2m->domain, pg);
>>>            count++;
>>>            /* Arbitrarily preempt every 512 iterations */
>>> -        if ( !(count % 512) && hypercall_preempt_check() )
>>> +        if ( allow_preemption && !(count % 512) &&
>> hypercall_preempt_check() )
>>>            {
>>>                rc = -ERESTART;
>>>                break;
>>> @@ -1736,6 +1736,17 @@ void p2m_final_teardown(struct domain *d)
>>>        if ( !p2m->domain )
>>>            return;
>>>
>>> +    if ( !page_list_empty(&p2m->pages) )
>>
>> Did you add this check to avoid the clean & invalidate if the list is empty?
> 
> Yep. I think we only need the p2m_teardown() if we actually have something
> in p2m->pages list.

How about adding the check in p2m_teardown()? So it will be easier to 
remember that the check can be dropped if we move the zeroing outside of 
the function.

> 
>>
>>> +        p2m_teardown(d, false);
>>
>> Today, it should be fine to ignore p2m_teardown(). But I would prefer if
>> we add an ASSERT()/BUG_ON() (or else) to make confirm this is the case.
> 
> Sorry I do not really understand why we can ignore the p2m_teardown()
> probably because of my English.

No, I forgot a word in my sentence. I was meant to say that the return 
of p2m_teardown() can be ignored in our situation because it only return 
0 or -ERESTART. The latter cannnot happen when the preemption is not 
enabled.

But I would like to add some code (either ASSERT() or BUG_ON()) to 
confirm that p2m_teardown() will always return 0.

> Let's talk a bit more in C if you don't mind :))
> Do you mean p2m_teardown() should be called here unconditionally without
> the if ( !page_list_empty(&p2m->pages) ) check?

See above.

> 
>>
>> This also wants to be documented on top of p2m_teardown() as it would be
>> easier to know that the function should always return 0 when
>> !allow_preemption is not set.
> 
> Ok, will do.
> 
>>
>> I also noticed that relinquish_p2m_mapping() is not called. This should
>> be fine for us because arch_domain_create() should never create a
>> mapping that requires p2m_put_l3_page() to be called.
>>
>> I think it would be good to check it in __p2m_set_entry(). So we don't
>> end up to add such mappings by mistake.
> 
> I thought for a while but failed to translate the above requirements
> to proper if conditions in __p2m_set_entry()...

For checking the mapping, we can do:

if ( !removing_mapping && (p2m_is_foreign(t) || (p2m_is_ram(t) && 
is_xenheap_mfn(mfn) )
     return -EINVAL;

We also need a way to check whether we are called from 
arch_domain_create(). I think we would need a field in the domain 
structure to indicate whether it is still initializating.

This is a bit ugly though. Any other suggestions?

> 
>>
>> I would have suggested to add a comment only for version and send a
>> follow-up patch. But I don't exactly know where to put it.
> 
> ...how about p2m_final_teardown(), we can use a TODO to explain why
> we don't need to call relinquish_p2m_mapping() and a following patch
> can fix this?

To me the TODO would make more sense on top of p2m_set_entry() because 
this is where the issue should be fixed. This is also where most of the 
reader will likely look if they want to understand how p2m_set_entry() 
can be used.

We could also have a comment in p2m_final_teardown() stating that the 
relinquish function is not called because the P2M should not contain any 
mapping that requires specific operation when removed. This could point 
to the comment in p2m_set_entry().

Cheers,
Henry Wang Oct. 15, 2022, 1:14 p.m. UTC | #15
Hi Julien,

Thanks for reply and sharing your opinions!

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Subject: Re: [PATCH v2] xen/arm: p2m: Populate pages for GICv2 mapping in
> arch_domain_create()
> On 14/10/2022 12:19, Henry Wang wrote:
> > Hi Julien,
> 
> Hi Henry,
> 
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >>>        struct p2m_domain *p2m = p2m_get_hostp2m(d);
> >>>        unsigned long count = 0;
> >>> @@ -1716,7 +1716,7 @@ int p2m_teardown(struct domain *d)
> >>>            p2m_free_page(p2m->domain, pg);
> >>>            count++;
> >>>            /* Arbitrarily preempt every 512 iterations */
> >>> -        if ( !(count % 512) && hypercall_preempt_check() )
> >>> +        if ( allow_preemption && !(count % 512) &&
> >> hypercall_preempt_check() )
> >>>            {
> >>>                rc = -ERESTART;
> >>>                break;
> >>> @@ -1736,6 +1736,17 @@ void p2m_final_teardown(struct domain *d)
> >>>        if ( !p2m->domain )
> >>>            return;
> >>>
> >>> +    if ( !page_list_empty(&p2m->pages) )
> >>
> >> Did you add this check to avoid the clean & invalidate if the list is empty?
> >
> > Yep. I think we only need the p2m_teardown() if we actually have
> something
> > in p2m->pages list.
> 
> How about adding the check in p2m_teardown()? So it will be easier to
> remember that the check can be dropped if we move the zeroing outside of
> the function.

Yes, I will turn above if check to a

if ( page_list_empty(&p2m->pages) )
    return 0;

in the beginning of the p2m_teardown(), and do the clean & invalidate
follow-up after the release.

> 
> >
> >>
> >>> +        p2m_teardown(d, false);
> >>
> >> Today, it should be fine to ignore p2m_teardown(). But I would prefer if
> >> we add an ASSERT()/BUG_ON() (or else) to make confirm this is the case.
> >
> > Sorry I do not really understand why we can ignore the p2m_teardown()
> > probably because of my English.
> 
> No, I forgot a word in my sentence. I was meant to say that the return
> of p2m_teardown() can be ignored in our situation because it only return
> 0 or -ERESTART. The latter cannnot happen when the preemption is not
> enabled.
> 
> But I would like to add some code (either ASSERT() or BUG_ON()) to
> confirm that p2m_teardown() will always return 0.

I added the doc asked in your previous email. Also, I will use a

ASSERT(p2m_teardown(d, false) == 0);

in p2m_final_teardown() here.

> 
> > Let's talk a bit more in C if you don't mind :))
> > Do you mean p2m_teardown() should be called here unconditionally
> without
> > the if ( !page_list_empty(&p2m->pages) ) check?
> 
> See above.

Thanks.

> 
> >
> >>
> >> This also wants to be documented on top of p2m_teardown() as it would
> be
> >> easier to know that the function should always return 0 when
> >> !allow_preemption is not set.
> >
> > Ok, will do.
> >
> >>
> >> I also noticed that relinquish_p2m_mapping() is not called. This should
> >> be fine for us because arch_domain_create() should never create a
> >> mapping that requires p2m_put_l3_page() to be called.
> >>
> >> I think it would be good to check it in __p2m_set_entry(). So we don't
> >> end up to add such mappings by mistake.
> >
> > I thought for a while but failed to translate the above requirements
> > to proper if conditions in __p2m_set_entry()...
> 
> For checking the mapping, we can do:
> 
> if ( !removing_mapping && (p2m_is_foreign(t) || (p2m_is_ram(t) &&
> is_xenheap_mfn(mfn) )
>      return -EINVAL;

Thanks for this, I guess without your hint it will take ages for me to
think of this.... 

> 
> We also need a way to check whether we are called from
> arch_domain_create(). I think we would need a field in the domain
> structure to indicate whether it is still initializating.
> 
> This is a bit ugly though. Any other suggestions?

My first thought is checking the implementation of domain_create()
and arch_domain_create() (as both will call arch_domain_destroy()
when fail) to see if there are some fields in struct domain or
struct arch_domain that are set/changed in this stage so probably we
can reuse. Otherwise I think adding a new field sounds a good idea.

> 
> >
> >>
> >> I would have suggested to add a comment only for version and send a
> >> follow-up patch. But I don't exactly know where to put it.
> >
> > ...how about p2m_final_teardown(), we can use a TODO to explain why
> > we don't need to call relinquish_p2m_mapping() and a following patch
> > can fix this?
> 
> To me the TODO would make more sense on top of p2m_set_entry()
> because
> this is where the issue should be fixed. This is also where most of the
> reader will likely look if they want to understand how p2m_set_entry()
> can be used.

Good idea, thanks for the suggestion!

> 
> We could also have a comment in p2m_final_teardown() stating that the
> relinquish function is not called because the P2M should not contain any
> mapping that requires specific operation when removed. This could point
> to the comment in p2m_set_entry().

Yes, my current wording for this would be:
+    /*
+     * No need to call relinquish_p2m_mapping() here because
+     * p2m_final_teardown() is called either after domain_relinquish_resources()
+     * where relinquish_p2m_mapping() has been called, or from failure path of
+     * domain_create()/arch_domain_create() where mappings that require
+     * p2m_put_l3_page() should never be created.
+     */

I will add the words pointing to p2m_set_entry().

Kind regards,
Henry


> 
> Cheers,
> 
> --
> Julien Grall
Jan Beulich Oct. 17, 2022, 8:40 a.m. UTC | #16
On 15.10.2022 15:14, Henry Wang wrote:
> 
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> On 14/10/2022 12:19, Henry Wang wrote:
>>
>>>> -----Original Message-----
>>>> From: Julien Grall <julien@xen.org>
>>>>> +        p2m_teardown(d, false);
>>>>
>>>> Today, it should be fine to ignore p2m_teardown(). But I would prefer if
>>>> we add an ASSERT()/BUG_ON() (or else) to make confirm this is the case.
>>>
>>> Sorry I do not really understand why we can ignore the p2m_teardown()
>>> probably because of my English.
>>
>> No, I forgot a word in my sentence. I was meant to say that the return
>> of p2m_teardown() can be ignored in our situation because it only return
>> 0 or -ERESTART. The latter cannnot happen when the preemption is not
>> enabled.
>>
>> But I would like to add some code (either ASSERT() or BUG_ON()) to
>> confirm that p2m_teardown() will always return 0.
> 
> I added the doc asked in your previous email. Also, I will use a
> 
> ASSERT(p2m_teardown(d, false) == 0);
> 
> in p2m_final_teardown() here.

Hopefully this was meant only as an abstract plan, not the exact code
you mean to add? ASSERT() expressions generally should not have side
effects (which includes function calls).

Jan
Henry Wang Oct. 17, 2022, 8:43 a.m. UTC | #17
Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> > I added the doc asked in your previous email. Also, I will use a
> >
> > ASSERT(p2m_teardown(d, false) == 0);
> >
> > in p2m_final_teardown() here.
> 
> Hopefully this was meant only as an abstract plan, not the exact code
> you mean to add? ASSERT() expressions generally should not have side
> effects (which includes function calls).

Yeah, when I wrote the v3 code I noticed that ASSERT might be limited
to the CONFIG_DEBUG so in the v3 I switched to BUG_ON which IIUC
can make sure the function call is valid all the time.

Kind regards,
Henry



> 
> Jan
Jan Beulich Oct. 17, 2022, 8:57 a.m. UTC | #18
On 17.10.2022 10:43, Henry Wang wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>>> I added the doc asked in your previous email. Also, I will use a
>>>
>>> ASSERT(p2m_teardown(d, false) == 0);
>>>
>>> in p2m_final_teardown() here.
>>
>> Hopefully this was meant only as an abstract plan, not the exact code
>> you mean to add? ASSERT() expressions generally should not have side
>> effects (which includes function calls).
> 
> Yeah, when I wrote the v3 code I noticed that ASSERT might be limited
> to the CONFIG_DEBUG so in the v3 I switched to BUG_ON which IIUC
> can make sure the function call is valid all the time.

But in the past we still recommended against doing so even with BUG_ON().
More recently Andrew (iirc) has voiced an opinion in the opposite
direction, but I'm not aware of us actually having changed direction in
this regard. Apart from that ASSERT() and BUG_ON() aren't meant to be
freely interchanged - the the respective part of ./CODFING_STYLE.

Jan
Henry Wang Dec. 8, 2022, 3:06 a.m. UTC | #19
Hi Julien, Stefano, Bertrand,

I am trying to work on the follow-up improvements about the Arm P2M code,
and while trying to address the comment below, I noticed there was an unfinished
discussion between me and Julien which I would like to continue and here
opinions from all of you (if possible).

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Subject: Re: [PATCH v2] xen/arm: p2m: Populate pages for GICv2 mapping in
> arch_domain_create()
> >> I also noticed that relinquish_p2m_mapping() is not called. This should
> >> be fine for us because arch_domain_create() should never create a
> >> mapping that requires p2m_put_l3_page() to be called.

For the background of the discussion, this was about the failure path of
arch_domain_create(), where we only call p2m_final_teardown() which does
not call relinquish_p2m_mapping()...

> >>
> >> I think it would be good to check it in __p2m_set_entry(). So we don't
> >> end up to add such mappings by mistake.
> 
> For checking the mapping, we can do:
> 
> if ( !removing_mapping && (p2m_is_foreign(t) || (p2m_is_ram(t) &&
> is_xenheap_mfn(mfn)) )
>      return -EINVAL;
> 
> We also need a way to check whether we are called from
> arch_domain_create(). I think we would need a field in the domain
> structure to indicate whether it is still initializating.
> 
> This is a bit ugly though. Any other suggestions?

...and I agree with Julien that this check makes great sense and I will add
this check in the follow up patch as discussed.

However I am not really convinced this looks pretty, and I would like to
hear opinions / suggestions about below code, does below code snippet
seem plausible to you?

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
@@ -1043,6 +1043,10 @@ static int __p2m_set_entry(struct p2m_domain *p2m,
      */
     ASSERT(target > 0 && target <= 3);

+    if ( !removing_mapping && d->arch.p2m.from_arch_domain_create && 
+         (p2m_is_foreign(t) || (p2m_is_ram(t) && is_xenheap_mfn(mfn)) )
+        return -EINVAL;
+
     table = p2m_get_root_pointer(p2m, sgfn);

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
@@ -687,6 +687,8 @@ int arch_domain_create(struct domain *d,
     if ( (rc = p2m_init(d)) != 0 )
         goto fail;

+    d->arch.p2m.from_arch_domain_create = true;
+
     rc = -ENOMEM;
     if ( (d->shared_info = alloc_xenheap_pages(0, 0)) == NULL )
         goto fail;
@@ -751,6 +753,8 @@ int arch_domain_create(struct domain *d,
     if ( (rc = domain_vpci_init(d)) != 0 )
         goto fail;

+    d->arch.p2m.from_arch_domain_create = false;
+
     return 0;

Kind regards,
Henry
Julien Grall Dec. 9, 2022, 6:51 p.m. UTC | #20
Hi Henry,

On 08/12/2022 03:06, Henry Wang wrote:
> I am trying to work on the follow-up improvements about the Arm P2M code,
> and while trying to address the comment below, I noticed there was an unfinished
> discussion between me and Julien which I would like to continue and here
> opinions from all of you (if possible).
> 
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Subject: Re: [PATCH v2] xen/arm: p2m: Populate pages for GICv2 mapping in
>> arch_domain_create()
>>>> I also noticed that relinquish_p2m_mapping() is not called. This should
>>>> be fine for us because arch_domain_create() should never create a
>>>> mapping that requires p2m_put_l3_page() to be called.
> 
> For the background of the discussion, this was about the failure path of
> arch_domain_create(), where we only call p2m_final_teardown() which does
> not call relinquish_p2m_mapping()...
So all this mess with the P2M is necessary because we are mapping the 
GICv2 CPU interface in arch_domain_create(). I think we should consider 
to defer the mapping to later.

Other than it makes the code simpler, it also means we could also the 
P2M root on the same numa node the domain is going to run (those 
information are passed later on).

There is a couple of options:
  1. Introduce a new hypercall to finish the initialization of a domain 
(e.g. allocating the P2M and map the GICv2 CPU interface). This option 
was briefly discussed with Jan (see [2])./
  2. Allocate the P2M when populate the P2M pool and defer the GICv2 CPU 
interface mapping until the first access (similar to how with deal with 
MMIO access for ACPI).

I find the second option is neater but it has the drawback that it 
requires on more trap to the hypervisor and we can't report any mapping 
failure (which should not happen if the P2M was correctly sized). So I 
am leaning towards option 2.

Any opinions?

Cheers,

[1] 
https://lore.kernel.org/xen-devel/6c07cdfc-888a-45bb-2077-528a809a62f4@suse.com/
Stefano Stabellini Dec. 9, 2022, 10:11 p.m. UTC | #21
On Fri, 9 Dec 2022, Julien Grall wrote:
> Hi Henry,
> 
> On 08/12/2022 03:06, Henry Wang wrote:
> > I am trying to work on the follow-up improvements about the Arm P2M code,
> > and while trying to address the comment below, I noticed there was an
> > unfinished
> > discussion between me and Julien which I would like to continue and here
> > opinions from all of you (if possible).
> > 
> > > -----Original Message-----
> > > From: Julien Grall <julien@xen.org>
> > > Subject: Re: [PATCH v2] xen/arm: p2m: Populate pages for GICv2 mapping in
> > > arch_domain_create()
> > > > > I also noticed that relinquish_p2m_mapping() is not called. This
> > > > > should
> > > > > be fine for us because arch_domain_create() should never create a
> > > > > mapping that requires p2m_put_l3_page() to be called.
> > 
> > For the background of the discussion, this was about the failure path of
> > arch_domain_create(), where we only call p2m_final_teardown() which does
> > not call relinquish_p2m_mapping()...
> So all this mess with the P2M is necessary because we are mapping the GICv2
> CPU interface in arch_domain_create(). I think we should consider to defer the
> mapping to later.
> 
> Other than it makes the code simpler, it also means we could also the P2M root
> on the same numa node the domain is going to run (those information are passed
> later on).
> 
> There is a couple of options:
>  1. Introduce a new hypercall to finish the initialization of a domain (e.g.
> allocating the P2M and map the GICv2 CPU interface). This option was briefly
> discussed with Jan (see [2])./
>  2. Allocate the P2M when populate the P2M pool and defer the GICv2 CPU
> interface mapping until the first access (similar to how with deal with MMIO
> access for ACPI).
> 
> I find the second option is neater but it has the drawback that it requires on
> more trap to the hypervisor and we can't report any mapping failure (which
> should not happen if the P2M was correctly sized). So I am leaning towards
> option 2.
> 
> Any opinions?

Option 1 is not great due to the extra hypercall. But I worry that
option 2 might make things harder for safety because the
mapping/initialization becomes "dynamic". I don't know if this is a
valid concern.

I would love to hear Bertrand's thoughts about it. Putting him in To:
Andrew Cooper Dec. 9, 2022, 10:18 p.m. UTC | #22
On 09/12/2022 18:51, Julien Grall wrote:
> Hi Henry,
>
> On 08/12/2022 03:06, Henry Wang wrote:
>> I am trying to work on the follow-up improvements about the Arm P2M
>> code,
>> and while trying to address the comment below, I noticed there was an
>> unfinished
>> discussion between me and Julien which I would like to continue and here
>> opinions from all of you (if possible).
>>
>>> -----Original Message-----
>>> From: Julien Grall <julien@xen.org>
>>> Subject: Re: [PATCH v2] xen/arm: p2m: Populate pages for GICv2
>>> mapping in
>>> arch_domain_create()
>>>>> I also noticed that relinquish_p2m_mapping() is not called. This
>>>>> should
>>>>> be fine for us because arch_domain_create() should never create a
>>>>> mapping that requires p2m_put_l3_page() to be called.
>>
>> For the background of the discussion, this was about the failure path of
>> arch_domain_create(), where we only call p2m_final_teardown() which does
>> not call relinquish_p2m_mapping()...
> So all this mess with the P2M is necessary because we are mapping the
> GICv2 CPU interface in arch_domain_create(). I think we should
> consider to defer the mapping to later.
>
> Other than it makes the code simpler, it also means we could also the
> P2M root on the same numa node the domain is going to run (those
> information are passed later on).
>
> There is a couple of options:
>  1. Introduce a new hypercall to finish the initialization of a domain
> (e.g. allocating the P2M and map the GICv2 CPU interface). This option
> was briefly discussed with Jan (see [2])./
>  2. Allocate the P2M when populate the P2M pool and defer the GICv2
> CPU interface mapping until the first access (similar to how with deal
> with MMIO access for ACPI).
>
> I find the second option is neater but it has the drawback that it
> requires on more trap to the hypervisor and we can't report any
> mapping failure (which should not happen if the P2M was correctly
> sized). So I am leaning towards option 2.
>
> Any opinions?

A DOMCTL_creation_finished hypercall (name subject to improvement) is
mandatory for encrypted VM support in x86 (there's crypto needed at this
point to complete the measurement of the guest and create an attestation
report), so we are going to gain something to this effect one way or
another.

Such a hypercall also removes the giant bodge of using
domain_unpause_by_systemcontroller() for this purpose.

But it seems like ARM already has arch_domain_creation_finished() so you
can do 1) independently of adding a new hypercall.  x86 uses that hook
for hooking up the magic APICv sinc page into the p2m, so you're already
in good(?) company with a GIC bodge.


As to where the logic *should* be, it should be done in the hypercall(s)
where the toolstack is describing the guests phymap to Xen.  The fact
that these don't exist is yet another problem needing to be worked on.


That said, moving the creation side of things doesn't change the
teardown requirements.  When I get time to respin the fault_ttl series,
Gitlab CI will be able to demonstrate the bug I keep on telling you is
still there, the fix for which is taking the patch I already wrote for
you.  There is no correct way to move the final loop out of
complete_domain_destroy(), even if in the general case you can make it
more preemptible by moving the allocation later.

~Andrew
Julien Grall Dec. 9, 2022, 11:24 p.m. UTC | #23
Hi Andrew,

On 09/12/2022 22:18, Andrew Cooper wrote:
> On 09/12/2022 18:51, Julien Grall wrote:
> That said, moving the creation side of things doesn't change the
> teardown requirements.  When I get time to respin the fault_ttl series,
> Gitlab CI will be able to demonstrate the bug I keep on telling you is
> still there, the fix for which is taking the patch I already wrote for
> you.

Is it just a matter of rebasing the series? If so, I am happy to give a 
try to see what comes up...

> There is no correct way to move the final loop out of
> complete_domain_destroy(), even if in the general case you can make it
> more preemptible by moving the allocation later.

Hmmm... I don't think the call to p2m_teardown() in 
complete_domain_destroy() will be necessary once we move the P2M 
allocation out of arch_domain_create().

This is because there will be no use of the P2M after 
domain_relinquish_resources().

Cheers,
Bertrand Marquis Jan. 3, 2023, 12:34 p.m. UTC | #24
Hi,

Sorry for the delay but I have very limited access to my mails right now.

On 9 Dec 2022, at 23:11, Stefano Stabellini <sstabellini@kernel.org<mailto:sstabellini@kernel.org>> wrote:

On Fri, 9 Dec 2022, Julien Grall wrote:
Hi Henry,

On 08/12/2022 03:06, Henry Wang wrote:
I am trying to work on the follow-up improvements about the Arm P2M code,
and while trying to address the comment below, I noticed there was an
unfinished
discussion between me and Julien which I would like to continue and here
opinions from all of you (if possible).

-----Original Message-----
From: Julien Grall <julien@xen.org<mailto:julien@xen.org>>
Subject: Re: [PATCH v2] xen/arm: p2m: Populate pages for GICv2 mapping in
arch_domain_create()
I also noticed that relinquish_p2m_mapping() is not called. This
should
be fine for us because arch_domain_create() should never create a
mapping that requires p2m_put_l3_page() to be called.

For the background of the discussion, this was about the failure path of
arch_domain_create(), where we only call p2m_final_teardown() which does
not call relinquish_p2m_mapping()...
So all this mess with the P2M is necessary because we are mapping the GICv2
CPU interface in arch_domain_create(). I think we should consider to defer the
mapping to later.

Other than it makes the code simpler, it also means we could also the P2M root
on the same numa node the domain is going to run (those information are passed
later on).

There is a couple of options:
1. Introduce a new hypercall to finish the initialization of a domain (e.g.
allocating the P2M and map the GICv2 CPU interface). This option was briefly
discussed with Jan (see [2])./
2. Allocate the P2M when populate the P2M pool and defer the GICv2 CPU
interface mapping until the first access (similar to how with deal with MMIO
access for ACPI).

I find the second option is neater but it has the drawback that it requires on
more trap to the hypervisor and we can't report any mapping failure (which
should not happen if the P2M was correctly sized). So I am leaning towards
option 2.

Any opinions?

Option 1 is not great due to the extra hypercall. But I worry that
option 2 might make things harder for safety because the
mapping/initialization becomes "dynamic". I don't know if this is a
valid concern.

I would love to hear Bertrand's thoughts about it. Putting him in To:

How option 1 would work for dom0less ?

Option 2 would make safety more challenging but not impossible (we have a lot of other use cases where we cannot map everything on boot).

I would vote for option 2 as I think we will not certify gicv2 and it is not adding an other hyper call.

Cheers
Bertrand
Julien Grall Jan. 8, 2023, 4:15 p.m. UTC | #25
Hi Bertrand,

On 03/01/2023 12:34, Bertrand Marquis wrote:
> Hi,
> 
> Sorry for the delay but I have very limited access to my mails right now.
> 
>> On 9 Dec 2022, at 23:11, Stefano Stabellini <sstabellini@kernel.org 
>> <mailto:sstabellini@kernel.org>> wrote:
>>
>> On Fri, 9 Dec 2022, Julien Grall wrote:
>>> Hi Henry,
>>>
>>> On 08/12/2022 03:06, Henry Wang wrote:
>>>> I am trying to work on the follow-up improvements about the Arm P2M 
>>>> code,
>>>> and while trying to address the comment below, I noticed there was an
>>>> unfinished
>>>> discussion between me and Julien which I would like to continue and here
>>>> opinions from all of you (if possible).
>>>>
>>>>> -----Original Message-----
>>>>> From: Julien Grall <julien@xen.org <mailto:julien@xen.org>>
>>>>> Subject: Re: [PATCH v2] xen/arm: p2m: Populate pages for GICv2 
>>>>> mapping in
>>>>> arch_domain_create()
>>>>>>> I also noticed that relinquish_p2m_mapping() is not called. This
>>>>>>> should
>>>>>>> be fine for us because arch_domain_create() should never create a
>>>>>>> mapping that requires p2m_put_l3_page() to be called.
>>>>
>>>> For the background of the discussion, this was about the failure path of
>>>> arch_domain_create(), where we only call p2m_final_teardown() which does
>>>> not call relinquish_p2m_mapping()...
>>> So all this mess with the P2M is necessary because we are mapping the 
>>> GICv2
>>> CPU interface in arch_domain_create(). I think we should consider to 
>>> defer the
>>> mapping to later.
>>>
>>> Other than it makes the code simpler, it also means we could also the 
>>> P2M root
>>> on the same numa node the domain is going to run (those information 
>>> are passed
>>> later on).
>>>
>>> There is a couple of options:
>>> 1. Introduce a new hypercall to finish the initialization of a domain 
>>> (e.g.
>>> allocating the P2M and map the GICv2 CPU interface). This option was 
>>> briefly
>>> discussed with Jan (see [2])./
>>> 2. Allocate the P2M when populate the P2M pool and defer the GICv2 CPU
>>> interface mapping until the first access (similar to how with deal 
>>> with MMIO
>>> access for ACPI).
>>>
>>> I find the second option is neater but it has the drawback that it 
>>> requires on
>>> more trap to the hypervisor and we can't report any mapping failure 
>>> (which
>>> should not happen if the P2M was correctly sized). So I am leaning 
>>> towards
>>> option 2.
>>>
>>> Any opinions?
>>
>> Option 1 is not great due to the extra hypercall. But I worry that
>> option 2 might make things harder for safety because the
>> mapping/initialization becomes "dynamic". I don't know if this is a
>> valid concern.
>>
>> I would love to hear Bertrand's thoughts about it. Putting him in To:
> 
> How option 1 would work for dom0less ?

Xen would call the function directly. Effectively, this would the same 
as all the other "hypercalls" we are using to build a dom0less domain.

> 
> Option 2 would make safety more challenging but not impossible (we have 
> a lot of other use cases where we cannot map everything on boot).
> 
> I would vote for option 2 as I think we will not certify gicv2 and it is 
> not adding an other hyper call.

It sounds like option 2 is the preferred way for now. Henry, can you 
have a look?

Cheers,
Henry Wang Jan. 9, 2023, 1:40 a.m. UTC | #26
Hi Julien and Bertrand,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Subject: Re: [PATCH v2] xen/arm: p2m: Populate pages for GICv2 mapping in
> arch_domain_create()
> 
> Hi Bertrand,
> 
> On 03/01/2023 12:34, Bertrand Marquis wrote:
> > Hi,
> >
> > Sorry for the delay but I have very limited access to my mails right now.
> >
> >> On 9 Dec 2022, at 23:11, Stefano Stabellini <sstabellini@kernel.org
> >> <mailto:sstabellini@kernel.org>> wrote:
> >>
> >> On Fri, 9 Dec 2022, Julien Grall wrote:
> >>> Hi Henry,
> >>>
> >>> On 08/12/2022 03:06, Henry Wang wrote:
> >>>> I am trying to work on the follow-up improvements about the Arm P2M
> >>>> code,
> >>>> and while trying to address the comment below, I noticed there was an
> >>>> unfinished
> >>>> discussion between me and Julien which I would like to continue and
> here
> >>>> opinions from all of you (if possible).
> >>>>
> >>>> For the background of the discussion, this was about the failure path of
> >>>> arch_domain_create(), where we only call p2m_final_teardown() which
> does
> >>>> not call relinquish_p2m_mapping()...
> >>> So all this mess with the P2M is necessary because we are mapping the
> >>> GICv2
> >>> CPU interface in arch_domain_create(). I think we should consider to
> >>> defer the
> >>> mapping to later.
> >>>
> >>> Other than it makes the code simpler, it also means we could also the
> >>> P2M root
> >>> on the same numa node the domain is going to run (those information
> >>> are passed
> >>> later on).
> >>>
> >>> There is a couple of options:
> >>> 1. Introduce a new hypercall to finish the initialization of a domain
> >>> (e.g.
> >>> allocating the P2M and map the GICv2 CPU interface). This option was
> >>> briefly
> >>> discussed with Jan (see [2])./
> >>> 2. Allocate the P2M when populate the P2M pool and defer the GICv2
> CPU
> >>> interface mapping until the first access (similar to how with deal
> >>> with MMIO
> >>> access for ACPI).
> >>>
> >>> I find the second option is neater but it has the drawback that it
> >>> requires on
> >>> more trap to the hypervisor and we can't report any mapping failure
> >>> (which
> >>> should not happen if the P2M was correctly sized). So I am leaning
> >>> towards
> >>> option 2.
> >>>
> >>> Any opinions?
> >>
> >> Option 1 is not great due to the extra hypercall. But I worry that
> >> option 2 might make things harder for safety because the
> >> mapping/initialization becomes "dynamic". I don't know if this is a
> >> valid concern.
> >>
> >> I would love to hear Bertrand's thoughts about it. Putting him in To:
> >
> > How option 1 would work for dom0less ?
> 
> Xen would call the function directly. Effectively, this would the same
> as all the other "hypercalls" we are using to build a dom0less domain.
> 
> >
> > Option 2 would make safety more challenging but not impossible (we have
> > a lot of other use cases where we cannot map everything on boot).
> >
> > I would vote for option 2 as I think we will not certify gicv2 and it is
> > not adding an other hyper call.

Thanks for the input.

> 
> It sounds like option 2 is the preferred way for now. Henry, can you
> have a look?

Yes, I would love to prepare some patches to follow the option 2.

Kind regards,
Henry

> 
> Cheers,
> 
> --
> Julien Grall
diff mbox series

Patch

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 2c84e6dbbb..831e248ad7 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -740,6 +740,20 @@  int arch_domain_create(struct domain *d,
         BUG();
     }
 
+    /*
+     * Hardware using GICv2 needs to create a P2M mapping of 8KB GICv2 area
+     * when the domain is created. Considering the worst case for page
+     * tables and keep a buffer, populate 16 pages to the P2M pages pool here.
+     * For GICv3, the above-mentioned P2M mapping is not necessary, but since
+     * the allocated 16 pages here would not be lost, hence populate these
+     * pages unconditionally.
+     */
+    spin_lock(&d->arch.paging.lock);
+    rc = p2m_set_allocation(d, 16, NULL);
+    spin_unlock(&d->arch.paging.lock);
+    if ( rc != 0 )
+        goto fail;
+
     if ( (rc = domain_vgic_register(d, &count)) != 0 )
         goto fail;
 
@@ -1064,7 +1078,7 @@  int domain_relinquish_resources(struct domain *d)
             return ret;
 
     PROGRESS(p2m):
-        ret = p2m_teardown(d);
+        ret = p2m_teardown(d, true);
         if ( ret )
             return ret;
 
diff --git a/xen/arch/arm/include/asm/p2m.h b/xen/arch/arm/include/asm/p2m.h
index 42bfd548c4..480d65e95e 100644
--- a/xen/arch/arm/include/asm/p2m.h
+++ b/xen/arch/arm/include/asm/p2m.h
@@ -194,14 +194,17 @@  int p2m_init(struct domain *d);
 
 /*
  * The P2M resources are freed in two parts:
- *  - p2m_teardown() will be called when relinquish the resources. It
- *    will free large resources (e.g. intermediate page-tables) that
- *    requires preemption.
+ *  - p2m_teardown() will be called preemptively when relinquish the
+ *    resources, in which case it will free large resources (e.g. intermediate
+ *    page-tables) that requires preemption.
  *  - p2m_final_teardown() will be called when domain struct is been
  *    freed. This *cannot* be preempted and therefore one small
  *    resources should be freed here.
+ *  Note that p2m_final_teardown() will also call p2m_teardown(), to properly
+ *  free the P2M when failures happen in the domain creation with P2M pages
+ *  already in use. In this case p2m_teardown() is called non-preemptively.
  */
-int p2m_teardown(struct domain *d);
+int p2m_teardown(struct domain *d, bool allow_preemption);
 void p2m_final_teardown(struct domain *d);
 
 /*
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index f17500ddf3..707bd3e2e3 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1685,7 +1685,7 @@  static void p2m_free_vmid(struct domain *d)
     spin_unlock(&vmid_alloc_lock);
 }
 
-int p2m_teardown(struct domain *d)
+int p2m_teardown(struct domain *d, bool allow_preemption)
 {
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
     unsigned long count = 0;
@@ -1716,7 +1716,7 @@  int p2m_teardown(struct domain *d)
         p2m_free_page(p2m->domain, pg);
         count++;
         /* Arbitrarily preempt every 512 iterations */
-        if ( !(count % 512) && hypercall_preempt_check() )
+        if ( allow_preemption && !(count % 512) && hypercall_preempt_check() )
         {
             rc = -ERESTART;
             break;
@@ -1736,6 +1736,17 @@  void p2m_final_teardown(struct domain *d)
     if ( !p2m->domain )
         return;
 
+    if ( !page_list_empty(&p2m->pages) )
+        p2m_teardown(d, false);
+
+    if ( d->arch.paging.p2m_total_pages != 0 )
+    {
+        spin_lock(&d->arch.paging.lock);
+        p2m_set_allocation(d, 0, NULL);
+        spin_unlock(&d->arch.paging.lock);
+        ASSERT(d->arch.paging.p2m_total_pages == 0);
+    }
+
     ASSERT(page_list_empty(&p2m->pages));
     ASSERT(page_list_empty(&d->arch.paging.p2m_freelist));