[5/5] x86/p2m: split write_p2m_entry() hook

Message ID	7b2b7cc9-8828-41bd-7949-764161bbe7ff@suse.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <SRS0=61/H=ED=lists.xenproject.org=xen-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0E21424680 Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org> Subject: [PATCH 5/5] x86/p2m: split write_p2m_entry() hook From: Jan Beulich <jbeulich@suse.com> To: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wl@xen.org>, =?utf-8?q?Roger_Pau_Monn=C3=A9?= <roger.pau@citrix.com>, George Dunlap <George.Dunlap@eu.citrix.com>, Tim Deegan <tim@xen.org> References: <29d30de1-2a8d-aee2-d3c3-331758766fc9@suse.com> Message-ID: <7b2b7cc9-8828-41bd-7949-764161bbe7ff@suse.com> Date: Wed, 28 Oct 2020 10:24:53 +0100 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0 MIME-Version: 1.0 In-Reply-To: <29d30de1-2a8d-aee2-d3c3-331758766fc9@suse.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit
Series	x86/p2m: hook adjustments \| expand [0/5] x86/p2m: hook adjustments [1/5] x86/p2m: paging_write_p2m_entry() is a private function [2/5] x86/p2m: collapse the two ->write_p2m_entry() hooks [3/5] x86/p2m: suppress audit_p2m hook when possible [4/5] x86/HAP: move nested-P2M flush calculations out of locked region [5/5] x86/p2m: split write_p2m_entry() hook

Jan Beulich Oct. 28, 2020, 9:24 a.m. UTC

Fair parts of the present handlers are identical; in fact
nestedp2m_write_p2m_entry() lacks a call to p2m_entry_modify(). Move
common parts right into write_p2m_entry(), splitting the hooks into a
"pre" one (needed just by shadow code) and a "post" one.

For the common parts moved I think that the p2m_flush_nestedp2m() is,
at least from an abstract perspective, also applicable in the shadow
case. Hence it doesn't get a 3rd hook put in place.

The initial comment that was in hap_write_p2m_entry() gets dropped: Its
placement was bogus, and looking back the the commit introducing it
(dd6de3ab9985 "Implement Nested-on-Nested") I can't see either what use
of a p2m it was meant to be associated with.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
RFC: This is effectively the alternative to the suggestion in an earlier
     patch that we might do away with the hook altogether. Of course a
     hybrid approach would also be possible, by using direct calls here
     instead of splitting the hook into two.
---
I'm unsure whether p2m_init_nestedp2m() zapping the "pre" hook is
actually correct, or whether previously the sh_unshadow_for_p2m_change()
invocation was wrongly skipped.

Tim Deegan Oct. 29, 2020, 8:46 p.m. UTC | #1

At 10:24 +0100 on 28 Oct (1603880693), Jan Beulich wrote:
> Fair parts of the present handlers are identical; in fact
> nestedp2m_write_p2m_entry() lacks a call to p2m_entry_modify(). Move
> common parts right into write_p2m_entry(), splitting the hooks into a
> "pre" one (needed just by shadow code) and a "post" one.
> 
> For the common parts moved I think that the p2m_flush_nestedp2m() is,
> at least from an abstract perspective, also applicable in the shadow
> case. Hence it doesn't get a 3rd hook put in place.
> 
> The initial comment that was in hap_write_p2m_entry() gets dropped: Its
> placement was bogus, and looking back the the commit introducing it
> (dd6de3ab9985 "Implement Nested-on-Nested") I can't see either what use
> of a p2m it was meant to be associated with.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

This seems like a good approach to me.  I'm happy with the shadow
parts but am not confident enough on nested p2m any more to have an
opinion on that side. 

Acked-by: Tim Deegan <tim@xen.org>

Roger Pau Monné Nov. 10, 2020, 1:59 p.m. UTC | #2

On Wed, Oct 28, 2020 at 10:24:53AM +0100, Jan Beulich wrote:
> Fair parts of the present handlers are identical; in fact
> nestedp2m_write_p2m_entry() lacks a call to p2m_entry_modify(). Move
> common parts right into write_p2m_entry(), splitting the hooks into a
> "pre" one (needed just by shadow code) and a "post" one.
> 
> For the common parts moved I think that the p2m_flush_nestedp2m() is,
> at least from an abstract perspective, also applicable in the shadow
> case. Hence it doesn't get a 3rd hook put in place.
> 
> The initial comment that was in hap_write_p2m_entry() gets dropped: Its
> placement was bogus, and looking back the the commit introducing it
> (dd6de3ab9985 "Implement Nested-on-Nested") I can't see either what use
> of a p2m it was meant to be associated with.

Is there any performance implication of moving from one hook to two
hooks? Since this shouldn't be a hot path I assume it's fine.

> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> RFC: This is effectively the alternative to the suggestion in an earlier
>      patch that we might do away with the hook altogether. Of course a
>      hybrid approach would also be possible, by using direct calls here
>      instead of splitting the hook into two.
> ---
> I'm unsure whether p2m_init_nestedp2m() zapping the "pre" hook is
> actually correct, or whether previously the sh_unshadow_for_p2m_change()
> invocation was wrongly skipped.
> 
> --- a/xen/arch/x86/mm/hap/hap.c
> +++ b/xen/arch/x86/mm/hap/hap.c
> @@ -774,55 +774,18 @@ static void hap_update_paging_modes(stru
>      put_gfn(d, cr3_gfn);
>  }
>  
> -static int
> -hap_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn, l1_pgentry_t *p,
> -                    l1_pgentry_t new, unsigned int level)
> +static void
> +hap_write_p2m_entry_post(struct p2m_domain *p2m, unsigned int oflags)
>  {
>      struct domain *d = p2m->domain;
> -    uint32_t old_flags;
> -    mfn_t omfn;
> -    int rc;
>  
> -    /* We know always use the host p2m here, regardless if the vcpu
> -     * is in host or guest mode. The vcpu can be in guest mode by
> -     * a hypercall which passes a domain and chooses mostly the first
> -     * vcpu. */
> -
> -    paging_lock(d);
> -    old_flags = l1e_get_flags(*p);
> -    omfn = l1e_get_mfn(*p);
> -
> -    rc = p2m_entry_modify(p2m, p2m_flags_to_type(l1e_get_flags(new)),
> -                          p2m_flags_to_type(old_flags), l1e_get_mfn(new),
> -                          omfn, level);
> -    if ( rc )
> -    {
> -        paging_unlock(d);
> -        return rc;
> -    }
> -
> -    safe_write_pte(p, new);
> -    if ( old_flags & _PAGE_PRESENT )
> +    if ( oflags & _PAGE_PRESENT )
>          guest_flush_tlb_mask(d, d->dirty_cpumask);
> -
> -    paging_unlock(d);
> -
> -    if ( nestedhvm_enabled(d) && (old_flags & _PAGE_PRESENT) &&
> -         !p2m_get_hostp2m(d)->defer_nested_flush &&
> -         /*
> -          * We are replacing a valid entry so we need to flush nested p2ms,
> -          * unless the only change is an increase in access rights.
> -          */
> -         (!mfn_eq(omfn, l1e_get_mfn(new)) ||
> -          !perms_strictly_increased(old_flags, l1e_get_flags(new))) )
> -        p2m_flush_nestedp2m(d);
> -
> -    return 0;
>  }
>  
>  void hap_p2m_init(struct p2m_domain *p2m)
>  {
> -    p2m->write_p2m_entry = hap_write_p2m_entry;
> +    p2m->write_p2m_entry_post = hap_write_p2m_entry_post;
>  }
>  
>  static unsigned long hap_gva_to_gfn_real_mode(
> --- a/xen/arch/x86/mm/hap/nested_hap.c
> +++ b/xen/arch/x86/mm/hap/nested_hap.c
> @@ -71,24 +71,11 @@
>  /*        NESTED VIRT P2M FUNCTIONS         */
>  /********************************************/
>  
> -int
> -nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
> -    l1_pgentry_t *p, l1_pgentry_t new, unsigned int level)
> +void
> +nestedp2m_write_p2m_entry_post(struct p2m_domain *p2m, unsigned int oflags)
>  {
> -    struct domain *d = p2m->domain;
> -    uint32_t old_flags;
> -
> -    paging_lock(d);
> -
> -    old_flags = l1e_get_flags(*p);
> -    safe_write_pte(p, new);
> -
> -    if (old_flags & _PAGE_PRESENT)
> -        guest_flush_tlb_mask(d, p2m->dirty_cpumask);
> -
> -    paging_unlock(d);
> -
> -    return 0;
> +    if ( oflags & _PAGE_PRESENT )
> +        guest_flush_tlb_mask(p2m->domain, p2m->dirty_cpumask);
>  }

This is a verbatimi copy of hap_write_p2m_entry_post. I assume there's
a reason why we need both, but I'm missing it.

>  
>  /********************************************/
> --- a/xen/arch/x86/mm/p2m-pt.c
> +++ b/xen/arch/x86/mm/p2m-pt.c
> @@ -122,17 +122,55 @@ static int write_p2m_entry(struct p2m_do
>  {
>      struct domain *d = p2m->domain;
>      struct vcpu *v = current;
> -    int rc = 0;
>  
>      if ( v->domain != d )
>          v = d->vcpu ? d->vcpu[0] : NULL;
>      if ( likely(v && paging_mode_enabled(d) && paging_get_hostmode(v)) ||
>           p2m_is_nestedp2m(p2m) )
> -        rc = p2m->write_p2m_entry(p2m, gfn, p, new, level);
> +    {
> +        unsigned int oflags;
> +        mfn_t omfn;
> +        int rc;
> +
> +        paging_lock(d);
> +
> +        if ( p2m->write_p2m_entry_pre )
> +            p2m->write_p2m_entry_pre(d, gfn, p, new, level);
> +
> +        oflags = l1e_get_flags(*p);
> +        omfn = l1e_get_mfn(*p);
> +
> +        rc = p2m_entry_modify(p2m, p2m_flags_to_type(l1e_get_flags(new)),
> +                              p2m_flags_to_type(oflags), l1e_get_mfn(new),
> +                              omfn, level);
> +        if ( rc )
> +        {
> +            paging_unlock(d);
> +            return rc;
> +        }
> +
> +        safe_write_pte(p, new);
> +
> +        if ( p2m->write_p2m_entry_post )
> +            p2m->write_p2m_entry_post(p2m, oflags);
> +
> +        paging_unlock(d);
> +
> +        if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) &&
> +             (oflags & _PAGE_PRESENT) &&
> +             !p2m_get_hostp2m(d)->defer_nested_flush &&
> +             /*
> +              * We are replacing a valid entry so we need to flush nested p2ms,
> +              * unless the only change is an increase in access rights.
> +              */
> +             (!mfn_eq(omfn, l1e_get_mfn(new)) ||
> +              !perms_strictly_increased(oflags, l1e_get_flags(new))) )
> +            p2m_flush_nestedp2m(d);

It feel slightly weird to have a nested p2m hook post, and yet have
nested specific code here.

Have you considered if the post hook could be moved outside of the
locked region, so that we could put this chunk there in the nested p2m
case?

Thanks, Roger.

Jan Beulich Nov. 10, 2020, 2:50 p.m. UTC | #3

On 10.11.2020 14:59, Roger Pau Monné wrote:
> On Wed, Oct 28, 2020 at 10:24:53AM +0100, Jan Beulich wrote:
>> Fair parts of the present handlers are identical; in fact
>> nestedp2m_write_p2m_entry() lacks a call to p2m_entry_modify(). Move
>> common parts right into write_p2m_entry(), splitting the hooks into a
>> "pre" one (needed just by shadow code) and a "post" one.
>>
>> For the common parts moved I think that the p2m_flush_nestedp2m() is,
>> at least from an abstract perspective, also applicable in the shadow
>> case. Hence it doesn't get a 3rd hook put in place.
>>
>> The initial comment that was in hap_write_p2m_entry() gets dropped: Its
>> placement was bogus, and looking back the the commit introducing it
>> (dd6de3ab9985 "Implement Nested-on-Nested") I can't see either what use
>> of a p2m it was meant to be associated with.
> 
> Is there any performance implication of moving from one hook to two
> hooks? Since this shouldn't be a hot path I assume it's fine.

Well, first of all just a couple of patches ago two indirect
calls were folded into one, so it's at least not getting worse
compared to where we started from. And then both HAP and nested
install just one of the two hooks.

As per the remark in an earlier patch, referred to ...

>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> RFC: This is effectively the alternative to the suggestion in an earlier
>>      patch that we might do away with the hook altogether. Of course a
>>      hybrid approach would also be possible, by using direct calls here
>>      instead of splitting the hook into two.

... here, there is the option of doing away with the indirect
calls altogether, but - as said earlier - at the price of at
least coming close to a layering violation.

>> --- a/xen/arch/x86/mm/hap/nested_hap.c
>> +++ b/xen/arch/x86/mm/hap/nested_hap.c
>> @@ -71,24 +71,11 @@
>>  /*        NESTED VIRT P2M FUNCTIONS         */
>>  /********************************************/
>>  
>> -int
>> -nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
>> -    l1_pgentry_t *p, l1_pgentry_t new, unsigned int level)
>> +void
>> +nestedp2m_write_p2m_entry_post(struct p2m_domain *p2m, unsigned int oflags)
>>  {
>> -    struct domain *d = p2m->domain;
>> -    uint32_t old_flags;
>> -
>> -    paging_lock(d);
>> -
>> -    old_flags = l1e_get_flags(*p);
>> -    safe_write_pte(p, new);
>> -
>> -    if (old_flags & _PAGE_PRESENT)
>> -        guest_flush_tlb_mask(d, p2m->dirty_cpumask);
>> -
>> -    paging_unlock(d);
>> -
>> -    return 0;
>> +    if ( oflags & _PAGE_PRESENT )
>> +        guest_flush_tlb_mask(p2m->domain, p2m->dirty_cpumask);
>>  }
> 
> This is a verbatimi copy of hap_write_p2m_entry_post. I assume there's
> a reason why we need both, but I'm missing it.

Only almost, since HAP has

    if ( oflags & _PAGE_PRESENT )
        guest_flush_tlb_mask(d, d->dirty_cpumask);

instead (i.e. they differ in which dirty_cpumask gets used).

>> --- a/xen/arch/x86/mm/p2m-pt.c
>> +++ b/xen/arch/x86/mm/p2m-pt.c
>> @@ -122,17 +122,55 @@ static int write_p2m_entry(struct p2m_do
>>  {
>>      struct domain *d = p2m->domain;
>>      struct vcpu *v = current;
>> -    int rc = 0;
>>  
>>      if ( v->domain != d )
>>          v = d->vcpu ? d->vcpu[0] : NULL;
>>      if ( likely(v && paging_mode_enabled(d) && paging_get_hostmode(v)) ||
>>           p2m_is_nestedp2m(p2m) )
>> -        rc = p2m->write_p2m_entry(p2m, gfn, p, new, level);
>> +    {
>> +        unsigned int oflags;
>> +        mfn_t omfn;
>> +        int rc;
>> +
>> +        paging_lock(d);
>> +
>> +        if ( p2m->write_p2m_entry_pre )
>> +            p2m->write_p2m_entry_pre(d, gfn, p, new, level);
>> +
>> +        oflags = l1e_get_flags(*p);
>> +        omfn = l1e_get_mfn(*p);
>> +
>> +        rc = p2m_entry_modify(p2m, p2m_flags_to_type(l1e_get_flags(new)),
>> +                              p2m_flags_to_type(oflags), l1e_get_mfn(new),
>> +                              omfn, level);
>> +        if ( rc )
>> +        {
>> +            paging_unlock(d);
>> +            return rc;
>> +        }
>> +
>> +        safe_write_pte(p, new);
>> +
>> +        if ( p2m->write_p2m_entry_post )
>> +            p2m->write_p2m_entry_post(p2m, oflags);
>> +
>> +        paging_unlock(d);
>> +
>> +        if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) &&
>> +             (oflags & _PAGE_PRESENT) &&
>> +             !p2m_get_hostp2m(d)->defer_nested_flush &&
>> +             /*
>> +              * We are replacing a valid entry so we need to flush nested p2ms,
>> +              * unless the only change is an increase in access rights.
>> +              */
>> +             (!mfn_eq(omfn, l1e_get_mfn(new)) ||
>> +              !perms_strictly_increased(oflags, l1e_get_flags(new))) )
>> +            p2m_flush_nestedp2m(d);
> 
> It feel slightly weird to have a nested p2m hook post, and yet have
> nested specific code here.
> 
> Have you considered if the post hook could be moved outside of the
> locked region, so that we could put this chunk there in the nested p2m
> case?

Yes, I did, but I don't think the post hook can be moved out. The
only alternative therefore would be a 3rd hook. And this hook would
then need to be installed on the host p2m for nested guests, as
opposed to nestedp2m_write_p2m_entry_post, which gets installed in
the nested p2m-s. As said in the description, the main reason I
decided against a 3rd hook is that I suppose the code here isn't
HAP-specific (while prior to this patch it was).

Jan

Roger Pau Monné Nov. 11, 2020, 12:17 p.m. UTC | #4

On Tue, Nov 10, 2020 at 03:50:44PM +0100, Jan Beulich wrote:
> On 10.11.2020 14:59, Roger Pau Monné wrote:
> > On Wed, Oct 28, 2020 at 10:24:53AM +0100, Jan Beulich wrote:
> >> --- a/xen/arch/x86/mm/hap/nested_hap.c
> >> +++ b/xen/arch/x86/mm/hap/nested_hap.c
> >> @@ -71,24 +71,11 @@
> >>  /*        NESTED VIRT P2M FUNCTIONS         */
> >>  /********************************************/
> >>  
> >> -int
> >> -nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
> >> -    l1_pgentry_t *p, l1_pgentry_t new, unsigned int level)
> >> +void
> >> +nestedp2m_write_p2m_entry_post(struct p2m_domain *p2m, unsigned int oflags)
> >>  {
> >> -    struct domain *d = p2m->domain;
> >> -    uint32_t old_flags;
> >> -
> >> -    paging_lock(d);
> >> -
> >> -    old_flags = l1e_get_flags(*p);
> >> -    safe_write_pte(p, new);
> >> -
> >> -    if (old_flags & _PAGE_PRESENT)
> >> -        guest_flush_tlb_mask(d, p2m->dirty_cpumask);
> >> -
> >> -    paging_unlock(d);
> >> -
> >> -    return 0;
> >> +    if ( oflags & _PAGE_PRESENT )
> >> +        guest_flush_tlb_mask(p2m->domain, p2m->dirty_cpumask);
> >>  }
> > 
> > This is a verbatimi copy of hap_write_p2m_entry_post. I assume there's
> > a reason why we need both, but I'm missing it.
> 
> Only almost, since HAP has
> 
>     if ( oflags & _PAGE_PRESENT )
>         guest_flush_tlb_mask(d, d->dirty_cpumask);
> 
> instead (i.e. they differ in which dirty_cpumask gets used).

Sorry, missed that bit.

> >> --- a/xen/arch/x86/mm/p2m-pt.c
> >> +++ b/xen/arch/x86/mm/p2m-pt.c
> >> @@ -122,17 +122,55 @@ static int write_p2m_entry(struct p2m_do
> >>  {
> >>      struct domain *d = p2m->domain;
> >>      struct vcpu *v = current;
> >> -    int rc = 0;
> >>  
> >>      if ( v->domain != d )
> >>          v = d->vcpu ? d->vcpu[0] : NULL;
> >>      if ( likely(v && paging_mode_enabled(d) && paging_get_hostmode(v)) ||
> >>           p2m_is_nestedp2m(p2m) )
> >> -        rc = p2m->write_p2m_entry(p2m, gfn, p, new, level);
> >> +    {
> >> +        unsigned int oflags;
> >> +        mfn_t omfn;
> >> +        int rc;
> >> +
> >> +        paging_lock(d);
> >> +
> >> +        if ( p2m->write_p2m_entry_pre )
> >> +            p2m->write_p2m_entry_pre(d, gfn, p, new, level);
> >> +
> >> +        oflags = l1e_get_flags(*p);
> >> +        omfn = l1e_get_mfn(*p);
> >> +
> >> +        rc = p2m_entry_modify(p2m, p2m_flags_to_type(l1e_get_flags(new)),
> >> +                              p2m_flags_to_type(oflags), l1e_get_mfn(new),
> >> +                              omfn, level);
> >> +        if ( rc )
> >> +        {
> >> +            paging_unlock(d);
> >> +            return rc;
> >> +        }
> >> +
> >> +        safe_write_pte(p, new);
> >> +
> >> +        if ( p2m->write_p2m_entry_post )
> >> +            p2m->write_p2m_entry_post(p2m, oflags);
> >> +
> >> +        paging_unlock(d);
> >> +
> >> +        if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) &&
> >> +             (oflags & _PAGE_PRESENT) &&
> >> +             !p2m_get_hostp2m(d)->defer_nested_flush &&
> >> +             /*
> >> +              * We are replacing a valid entry so we need to flush nested p2ms,
> >> +              * unless the only change is an increase in access rights.
> >> +              */
> >> +             (!mfn_eq(omfn, l1e_get_mfn(new)) ||
> >> +              !perms_strictly_increased(oflags, l1e_get_flags(new))) )
> >> +            p2m_flush_nestedp2m(d);
> > 
> > It feel slightly weird to have a nested p2m hook post, and yet have
> > nested specific code here.
> > 
> > Have you considered if the post hook could be moved outside of the
> > locked region, so that we could put this chunk there in the nested p2m
> > case?
> 
> Yes, I did, but I don't think the post hook can be moved out. The
> only alternative therefore would be a 3rd hook. And this hook would
> then need to be installed on the host p2m for nested guests, as
> opposed to nestedp2m_write_p2m_entry_post, which gets installed in
> the nested p2m-s. As said in the description, the main reason I
> decided against a 3rd hook is that I suppose the code here isn't
> HAP-specific (while prior to this patch it was).

I'm not convinced the guest TLB flush needs to be performed while
holding the paging lock. The point of such flush is to invalidate any
intermediate guest visible translations that might now be invalid as a
result of the p2m change, but the paging lock doesn't affect the guest
in any way.

It's true that the dirty_cpumask might change, but I think we only
care that when returning from the function there are no stale cache
entries that contain the now invalid translation, and this can be
achieved equally by doing the flush outside of the locked region.

Roger.

Jan Beulich Nov. 12, 2020, 12:29 p.m. UTC | #5

On 11.11.2020 13:17, Roger Pau Monné wrote:
> On Tue, Nov 10, 2020 at 03:50:44PM +0100, Jan Beulich wrote:
>> On 10.11.2020 14:59, Roger Pau Monné wrote:
>>> On Wed, Oct 28, 2020 at 10:24:53AM +0100, Jan Beulich wrote:
>>>> --- a/xen/arch/x86/mm/p2m-pt.c
>>>> +++ b/xen/arch/x86/mm/p2m-pt.c
>>>> @@ -122,17 +122,55 @@ static int write_p2m_entry(struct p2m_do
>>>>  {
>>>>      struct domain *d = p2m->domain;
>>>>      struct vcpu *v = current;
>>>> -    int rc = 0;
>>>>  
>>>>      if ( v->domain != d )
>>>>          v = d->vcpu ? d->vcpu[0] : NULL;
>>>>      if ( likely(v && paging_mode_enabled(d) && paging_get_hostmode(v)) ||
>>>>           p2m_is_nestedp2m(p2m) )
>>>> -        rc = p2m->write_p2m_entry(p2m, gfn, p, new, level);
>>>> +    {
>>>> +        unsigned int oflags;
>>>> +        mfn_t omfn;
>>>> +        int rc;
>>>> +
>>>> +        paging_lock(d);
>>>> +
>>>> +        if ( p2m->write_p2m_entry_pre )
>>>> +            p2m->write_p2m_entry_pre(d, gfn, p, new, level);
>>>> +
>>>> +        oflags = l1e_get_flags(*p);
>>>> +        omfn = l1e_get_mfn(*p);
>>>> +
>>>> +        rc = p2m_entry_modify(p2m, p2m_flags_to_type(l1e_get_flags(new)),
>>>> +                              p2m_flags_to_type(oflags), l1e_get_mfn(new),
>>>> +                              omfn, level);
>>>> +        if ( rc )
>>>> +        {
>>>> +            paging_unlock(d);
>>>> +            return rc;
>>>> +        }
>>>> +
>>>> +        safe_write_pte(p, new);
>>>> +
>>>> +        if ( p2m->write_p2m_entry_post )
>>>> +            p2m->write_p2m_entry_post(p2m, oflags);
>>>> +
>>>> +        paging_unlock(d);
>>>> +
>>>> +        if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) &&
>>>> +             (oflags & _PAGE_PRESENT) &&
>>>> +             !p2m_get_hostp2m(d)->defer_nested_flush &&
>>>> +             /*
>>>> +              * We are replacing a valid entry so we need to flush nested p2ms,
>>>> +              * unless the only change is an increase in access rights.
>>>> +              */
>>>> +             (!mfn_eq(omfn, l1e_get_mfn(new)) ||
>>>> +              !perms_strictly_increased(oflags, l1e_get_flags(new))) )
>>>> +            p2m_flush_nestedp2m(d);
>>>
>>> It feel slightly weird to have a nested p2m hook post, and yet have
>>> nested specific code here.
>>>
>>> Have you considered if the post hook could be moved outside of the
>>> locked region, so that we could put this chunk there in the nested p2m
>>> case?
>>
>> Yes, I did, but I don't think the post hook can be moved out. The
>> only alternative therefore would be a 3rd hook. And this hook would
>> then need to be installed on the host p2m for nested guests, as
>> opposed to nestedp2m_write_p2m_entry_post, which gets installed in
>> the nested p2m-s. As said in the description, the main reason I
>> decided against a 3rd hook is that I suppose the code here isn't
>> HAP-specific (while prior to this patch it was).
> 
> I'm not convinced the guest TLB flush needs to be performed while
> holding the paging lock. The point of such flush is to invalidate any
> intermediate guest visible translations that might now be invalid as a
> result of the p2m change, but the paging lock doesn't affect the guest
> in any way.
> 
> It's true that the dirty_cpumask might change, but I think we only
> care that when returning from the function there are no stale cache
> entries that contain the now invalid translation, and this can be
> achieved equally by doing the flush outside of the locked region.

I agree with all this. If only it was merely about TLB flushes. In
the shadow case, shadow_blow_all_tables() gets invoked, and that
one - looking at the other call sites - wants the paging lock held.

Additionally moving the stuff out of the locked region wouldn't
allow us to achieve the goal of moving the nested flush into the
hook, unless we introduced further hook handlers to be installed
on the host p2m-s of nested guests.

Jan

Roger Pau Monné Nov. 12, 2020, 1:07 p.m. UTC | #6

On Thu, Nov 12, 2020 at 01:29:33PM +0100, Jan Beulich wrote:
> On 11.11.2020 13:17, Roger Pau Monné wrote:
> > On Tue, Nov 10, 2020 at 03:50:44PM +0100, Jan Beulich wrote:
> >> On 10.11.2020 14:59, Roger Pau Monné wrote:
> >>> On Wed, Oct 28, 2020 at 10:24:53AM +0100, Jan Beulich wrote:
> >>>> --- a/xen/arch/x86/mm/p2m-pt.c
> >>>> +++ b/xen/arch/x86/mm/p2m-pt.c
> >>>> @@ -122,17 +122,55 @@ static int write_p2m_entry(struct p2m_do
> >>>>  {
> >>>>      struct domain *d = p2m->domain;
> >>>>      struct vcpu *v = current;
> >>>> -    int rc = 0;
> >>>>  
> >>>>      if ( v->domain != d )
> >>>>          v = d->vcpu ? d->vcpu[0] : NULL;
> >>>>      if ( likely(v && paging_mode_enabled(d) && paging_get_hostmode(v)) ||
> >>>>           p2m_is_nestedp2m(p2m) )
> >>>> -        rc = p2m->write_p2m_entry(p2m, gfn, p, new, level);
> >>>> +    {
> >>>> +        unsigned int oflags;
> >>>> +        mfn_t omfn;
> >>>> +        int rc;
> >>>> +
> >>>> +        paging_lock(d);
> >>>> +
> >>>> +        if ( p2m->write_p2m_entry_pre )
> >>>> +            p2m->write_p2m_entry_pre(d, gfn, p, new, level);
> >>>> +
> >>>> +        oflags = l1e_get_flags(*p);
> >>>> +        omfn = l1e_get_mfn(*p);
> >>>> +
> >>>> +        rc = p2m_entry_modify(p2m, p2m_flags_to_type(l1e_get_flags(new)),
> >>>> +                              p2m_flags_to_type(oflags), l1e_get_mfn(new),
> >>>> +                              omfn, level);
> >>>> +        if ( rc )
> >>>> +        {
> >>>> +            paging_unlock(d);
> >>>> +            return rc;
> >>>> +        }
> >>>> +
> >>>> +        safe_write_pte(p, new);
> >>>> +
> >>>> +        if ( p2m->write_p2m_entry_post )
> >>>> +            p2m->write_p2m_entry_post(p2m, oflags);
> >>>> +
> >>>> +        paging_unlock(d);
> >>>> +
> >>>> +        if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) &&
> >>>> +             (oflags & _PAGE_PRESENT) &&
> >>>> +             !p2m_get_hostp2m(d)->defer_nested_flush &&
> >>>> +             /*
> >>>> +              * We are replacing a valid entry so we need to flush nested p2ms,
> >>>> +              * unless the only change is an increase in access rights.
> >>>> +              */
> >>>> +             (!mfn_eq(omfn, l1e_get_mfn(new)) ||
> >>>> +              !perms_strictly_increased(oflags, l1e_get_flags(new))) )
> >>>> +            p2m_flush_nestedp2m(d);
> >>>
> >>> It feel slightly weird to have a nested p2m hook post, and yet have
> >>> nested specific code here.
> >>>
> >>> Have you considered if the post hook could be moved outside of the
> >>> locked region, so that we could put this chunk there in the nested p2m
> >>> case?
> >>
> >> Yes, I did, but I don't think the post hook can be moved out. The
> >> only alternative therefore would be a 3rd hook. And this hook would
> >> then need to be installed on the host p2m for nested guests, as
> >> opposed to nestedp2m_write_p2m_entry_post, which gets installed in
> >> the nested p2m-s. As said in the description, the main reason I
> >> decided against a 3rd hook is that I suppose the code here isn't
> >> HAP-specific (while prior to this patch it was).
> > 
> > I'm not convinced the guest TLB flush needs to be performed while
> > holding the paging lock. The point of such flush is to invalidate any
> > intermediate guest visible translations that might now be invalid as a
> > result of the p2m change, but the paging lock doesn't affect the guest
> > in any way.
> > 
> > It's true that the dirty_cpumask might change, but I think we only
> > care that when returning from the function there are no stale cache
> > entries that contain the now invalid translation, and this can be
> > achieved equally by doing the flush outside of the locked region.
> 
> I agree with all this. If only it was merely about TLB flushes. In
> the shadow case, shadow_blow_all_tables() gets invoked, and that
> one - looking at the other call sites - wants the paging lock held.

You got me confused here, I think you meant shadow_blow_tables?

The post hook for shadow could pick the lock again, as I don't think
the removal of the tables needs to be strictly done inside of the same
locked region?

Something to consider from a performance PoV.

> Additionally moving the stuff out of the locked region wouldn't
> allow us to achieve the goal of moving the nested flush into the
> hook, unless we introduced further hook handlers to be installed
> on the host p2m-s of nested guests.

Right, or else we would also need to add that chunk in the
non-nestedp2m hook also?

Maybe you could join both the nested and non-nested hooks and use a
different dirty bitmap for the flush?

Thanks, Roger.

Jan Beulich Nov. 12, 2020, 2:04 p.m. UTC | #7

On 12.11.2020 14:07, Roger Pau Monné wrote:
> On Thu, Nov 12, 2020 at 01:29:33PM +0100, Jan Beulich wrote:
>> On 11.11.2020 13:17, Roger Pau Monné wrote:
>>> On Tue, Nov 10, 2020 at 03:50:44PM +0100, Jan Beulich wrote:
>>>> On 10.11.2020 14:59, Roger Pau Monné wrote:
>>>>> On Wed, Oct 28, 2020 at 10:24:53AM +0100, Jan Beulich wrote:
>>>>>> --- a/xen/arch/x86/mm/p2m-pt.c
>>>>>> +++ b/xen/arch/x86/mm/p2m-pt.c
>>>>>> @@ -122,17 +122,55 @@ static int write_p2m_entry(struct p2m_do
>>>>>>  {
>>>>>>      struct domain *d = p2m->domain;
>>>>>>      struct vcpu *v = current;
>>>>>> -    int rc = 0;
>>>>>>  
>>>>>>      if ( v->domain != d )
>>>>>>          v = d->vcpu ? d->vcpu[0] : NULL;
>>>>>>      if ( likely(v && paging_mode_enabled(d) && paging_get_hostmode(v)) ||
>>>>>>           p2m_is_nestedp2m(p2m) )
>>>>>> -        rc = p2m->write_p2m_entry(p2m, gfn, p, new, level);
>>>>>> +    {
>>>>>> +        unsigned int oflags;
>>>>>> +        mfn_t omfn;
>>>>>> +        int rc;
>>>>>> +
>>>>>> +        paging_lock(d);
>>>>>> +
>>>>>> +        if ( p2m->write_p2m_entry_pre )
>>>>>> +            p2m->write_p2m_entry_pre(d, gfn, p, new, level);
>>>>>> +
>>>>>> +        oflags = l1e_get_flags(*p);
>>>>>> +        omfn = l1e_get_mfn(*p);
>>>>>> +
>>>>>> +        rc = p2m_entry_modify(p2m, p2m_flags_to_type(l1e_get_flags(new)),
>>>>>> +                              p2m_flags_to_type(oflags), l1e_get_mfn(new),
>>>>>> +                              omfn, level);
>>>>>> +        if ( rc )
>>>>>> +        {
>>>>>> +            paging_unlock(d);
>>>>>> +            return rc;
>>>>>> +        }
>>>>>> +
>>>>>> +        safe_write_pte(p, new);
>>>>>> +
>>>>>> +        if ( p2m->write_p2m_entry_post )
>>>>>> +            p2m->write_p2m_entry_post(p2m, oflags);
>>>>>> +
>>>>>> +        paging_unlock(d);
>>>>>> +
>>>>>> +        if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) &&
>>>>>> +             (oflags & _PAGE_PRESENT) &&
>>>>>> +             !p2m_get_hostp2m(d)->defer_nested_flush &&
>>>>>> +             /*
>>>>>> +              * We are replacing a valid entry so we need to flush nested p2ms,
>>>>>> +              * unless the only change is an increase in access rights.
>>>>>> +              */
>>>>>> +             (!mfn_eq(omfn, l1e_get_mfn(new)) ||
>>>>>> +              !perms_strictly_increased(oflags, l1e_get_flags(new))) )
>>>>>> +            p2m_flush_nestedp2m(d);
>>>>>
>>>>> It feel slightly weird to have a nested p2m hook post, and yet have
>>>>> nested specific code here.
>>>>>
>>>>> Have you considered if the post hook could be moved outside of the
>>>>> locked region, so that we could put this chunk there in the nested p2m
>>>>> case?
>>>>
>>>> Yes, I did, but I don't think the post hook can be moved out. The
>>>> only alternative therefore would be a 3rd hook. And this hook would
>>>> then need to be installed on the host p2m for nested guests, as
>>>> opposed to nestedp2m_write_p2m_entry_post, which gets installed in
>>>> the nested p2m-s. As said in the description, the main reason I
>>>> decided against a 3rd hook is that I suppose the code here isn't
>>>> HAP-specific (while prior to this patch it was).
>>>
>>> I'm not convinced the guest TLB flush needs to be performed while
>>> holding the paging lock. The point of such flush is to invalidate any
>>> intermediate guest visible translations that might now be invalid as a
>>> result of the p2m change, but the paging lock doesn't affect the guest
>>> in any way.
>>>
>>> It's true that the dirty_cpumask might change, but I think we only
>>> care that when returning from the function there are no stale cache
>>> entries that contain the now invalid translation, and this can be
>>> achieved equally by doing the flush outside of the locked region.
>>
>> I agree with all this. If only it was merely about TLB flushes. In
>> the shadow case, shadow_blow_all_tables() gets invoked, and that
>> one - looking at the other call sites - wants the paging lock held.
> 
> You got me confused here, I think you meant shadow_blow_tables?

Oh, yes, sorry - copy-and-paste from the wrong source.

> The post hook for shadow could pick the lock again, as I don't think
> the removal of the tables needs to be strictly done inside of the same
> locked region?

I think it does, or else a use of the now stale tables may occur
before they got blown away. Tim?

> Something to consider from a performance PoV.
> 
>> Additionally moving the stuff out of the locked region wouldn't
>> allow us to achieve the goal of moving the nested flush into the
>> hook, unless we introduced further hook handlers to be installed
>> on the host p2m-s of nested guests.
> 
> Right, or else we would also need to add that chunk in the
> non-nestedp2m hook also?

I'm a little confused by the question: If we wanted to move this
into the hook functions, it would need to be both hap's and
shadow's, i.e. _only_ the non-nested ones. IOW it could then
stay in hap's and be duplicated into shadow's. Avoiding the
duplication _and_ keeping it outside the locked region is why I
moved it into the common logic (provided, of course, I'm right
with my understanding of it also being needed in the shadow
case; else it could stay in the hap function alone), of which
the 2nd aspect would go away if the hook invocation itself lived
outside the locked region. But the duplication of this would
still concern me ...

> Maybe you could join both the nested and non-nested hooks and use a
> different dirty bitmap for the flush?

What would this gain us? Extra conditionals in a hook, when the
hook is (indirectly) to avoid having endless conditionals in the
common logic?

Jan

Tim Deegan Nov. 12, 2020, 5:52 p.m. UTC | #8

At 15:04 +0100 on 12 Nov (1605193496), Jan Beulich wrote:
> On 12.11.2020 14:07, Roger Pau Monné wrote:
> > On Thu, Nov 12, 2020 at 01:29:33PM +0100, Jan Beulich wrote:
> >> I agree with all this. If only it was merely about TLB flushes. In
> >> the shadow case, shadow_blow_all_tables() gets invoked, and that
> >> one - looking at the other call sites - wants the paging lock held.
[...]
> > The post hook for shadow could pick the lock again, as I don't think
> > the removal of the tables needs to be strictly done inside of the same
> > locked region?
> 
> I think it does, or else a use of the now stale tables may occur
> before they got blown away. Tim?

Is this the call to shadow_blow_tables() in the write_p2m_entry path?
I think it would be safe to drop and re-take the paging lock there as
long as the call happens before the write is considered to have
finished.

But it would not be a useful performance improvement - any update that
takes this path is going to be very slow regardless.  So unless you
have another pressing reason to split it up, I would be inclined to
leave it as it is.  That way it's easier to see that the locking is
correct.

Cheers,

Tim.

Jan Beulich Nov. 13, 2020, 9:52 a.m. UTC | #9

On 12.11.2020 18:52, Tim Deegan wrote:
> At 15:04 +0100 on 12 Nov (1605193496), Jan Beulich wrote:
>> On 12.11.2020 14:07, Roger Pau Monné wrote:
>>> On Thu, Nov 12, 2020 at 01:29:33PM +0100, Jan Beulich wrote:
>>>> I agree with all this. If only it was merely about TLB flushes. In
>>>> the shadow case, shadow_blow_all_tables() gets invoked, and that
>>>> one - looking at the other call sites - wants the paging lock held.
> [...]
>>> The post hook for shadow could pick the lock again, as I don't think
>>> the removal of the tables needs to be strictly done inside of the same
>>> locked region?
>>
>> I think it does, or else a use of the now stale tables may occur
>> before they got blown away. Tim?
> 
> Is this the call to shadow_blow_tables() in the write_p2m_entry path?

Yes.

> I think it would be safe to drop and re-take the paging lock there as
> long as the call happens before the write is considered to have
> finished.
> 
> But it would not be a useful performance improvement - any update that
> takes this path is going to be very slow regardless.  So unless you
> have another pressing reason to split it up, I would be inclined to
> leave it as it is.  That way it's easier to see that the locking is
> correct.

Thanks for the clarification.

Roger - what's your view at this point?

Jan

Roger Pau Monné Nov. 18, 2020, 9:22 a.m. UTC | #10

On Fri, Nov 13, 2020 at 10:52:58AM +0100, Jan Beulich wrote:
> On 12.11.2020 18:52, Tim Deegan wrote:
> > At 15:04 +0100 on 12 Nov (1605193496), Jan Beulich wrote:
> >> On 12.11.2020 14:07, Roger Pau Monné wrote:
> >>> On Thu, Nov 12, 2020 at 01:29:33PM +0100, Jan Beulich wrote:
> >>>> I agree with all this. If only it was merely about TLB flushes. In
> >>>> the shadow case, shadow_blow_all_tables() gets invoked, and that
> >>>> one - looking at the other call sites - wants the paging lock held.
> > [...]
> >>> The post hook for shadow could pick the lock again, as I don't think
> >>> the removal of the tables needs to be strictly done inside of the same
> >>> locked region?
> >>
> >> I think it does, or else a use of the now stale tables may occur
> >> before they got blown away. Tim?
> > 
> > Is this the call to shadow_blow_tables() in the write_p2m_entry path?
> 
> Yes.
> 
> > I think it would be safe to drop and re-take the paging lock there as
> > long as the call happens before the write is considered to have
> > finished.
> > 
> > But it would not be a useful performance improvement - any update that
> > takes this path is going to be very slow regardless.  So unless you
> > have another pressing reason to split it up, I would be inclined to
> > leave it as it is.  That way it's easier to see that the locking is
> > correct.
> 
> Thanks for the clarification.
> 
> Roger - what's your view at this point?

So my main concern was not really about making this path faster -
after all this shouldn't be a hot path anyway, but rather reducing the
region protected by the paging lock, since it's a global lock that's
quite contended. In the HAP case we could move the flush outside of
the locked region thus reducing the lock time.

Anyway, seeing there's not much consensus on this aspect leaving it
as-is is no worse that what's currently in there.

Roger.

Roger Pau Monné Nov. 18, 2020, 9:31 a.m. UTC | #11

On Wed, Oct 28, 2020 at 10:24:53AM +0100, Jan Beulich wrote:
> Fair parts of the present handlers are identical; in fact
> nestedp2m_write_p2m_entry() lacks a call to p2m_entry_modify(). Move
> common parts right into write_p2m_entry(), splitting the hooks into a
> "pre" one (needed just by shadow code) and a "post" one.
> 
> For the common parts moved I think that the p2m_flush_nestedp2m() is,
> at least from an abstract perspective, also applicable in the shadow
> case. Hence it doesn't get a 3rd hook put in place.
> 
> The initial comment that was in hap_write_p2m_entry() gets dropped: Its
> placement was bogus, and looking back the the commit introducing it
> (dd6de3ab9985 "Implement Nested-on-Nested") I can't see either what use
> of a p2m it was meant to be associated with.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.

[5/5] x86/p2m: split write_p2m_entry() hook

Commit Message

Comments

Patch