diff mbox series

[2/2] domain: use PGC_extra domheap page for shared_info

Message ID 20200225095357.3923-3-pdurrant@amazon.com (mailing list archive)
State Superseded
Headers show
Series remove one more shared xenheap page: shared_info | expand

Commit Message

Paul Durrant Feb. 25, 2020, 9:53 a.m. UTC
There's no particular reason shared_info need use a xenheap page. It's
only purpose is to be mapped by the guest so use a PGC_extra domheap page
instead. This does entail freeing shared_info during
domain_relinquish_resources() rather than domain_destroy() so care is
needed to avoid de-referencing a NULL shared_info pointer hence some
extra checks of 'is_dying' are needed.
ASSERTions are added before apparently vulnerable dereferences in the
event channel code. These should not be hit because domain_kill() calls
evtchn_destroy() before calling domain_relinquish_resources() but are
added to catch any future re-ordering.
For Arm, the call to free_shared_info() in arch_domain_destroy() is left
in place since it called in the error path for arch_domain_create().

NOTE: A modification in p2m_alloc_table() is required to avoid a false
      positive when checking for domain memory.
      A fix is also needed in dom0_construct_pv() to avoid automatically
      adding PGC_extra pages to the physmap.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien@xen.org>
Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Wei Liu <wl@xen.org>
---
 xen/arch/arm/domain.c        |  2 ++
 xen/arch/x86/domain.c        |  3 ++-
 xen/arch/x86/mm/p2m.c        |  3 +--
 xen/arch/x86/pv/dom0_build.c |  4 ++++
 xen/common/domain.c          | 25 +++++++++++++++++++------
 xen/common/event_2l.c        |  4 ++++
 xen/common/event_fifo.c      |  1 +
 xen/common/time.c            |  3 +++
 8 files changed, 36 insertions(+), 9 deletions(-)

Comments

Julien Grall Feb. 26, 2020, 11:33 a.m. UTC | #1
Hi Paul,

On 25/02/2020 09:53, Paul Durrant wrote:
> There's no particular reason shared_info need use a xenheap page.

AFAICT, a side-effect of this change is the shared_info is now going to 
be part of the migration stream. One issue with this is on the restore 
side, they will be accounted in number of pages allocated to the domain.

I am fairly certain dirty logging on page with PGC_extra set would not 
work properly as well.

As the pages will be recreated in the restore side, I would suggest to 
skip them in XEN_DOMCTL_getpageframeinfo3.

> It's
> only purpose is to be mapped by the guest so use a PGC_extra domheap page
> instead. This does entail freeing shared_info during
> domain_relinquish_resources() rather than domain_destroy() so care is
> needed to avoid de-referencing a NULL shared_info pointer hence some
> extra checks of 'is_dying' are needed.
> ASSERTions are added before apparently vulnerable dereferences in the
> event channel code. These should not be hit because domain_kill() calls
> evtchn_destroy() before calling domain_relinquish_resources() but are
> added to catch any future re-ordering.

IHMO, the ASSERTions in the event channel pending/mask/... helpers will 
not protect against re-ordering. You may never hit them even if there is 
a re-ordering. It would be better to add an ASSERT()/BUG_ON() in 
evtchn_destroy() and possibly a comment in domain_kill() to mention 
ordering.

> For Arm, the call to free_shared_info() in arch_domain_destroy() is left
> in place since it called in the error path for arch_domain_create().
> 
> NOTE: A modification in p2m_alloc_table() is required to avoid a false
>        positive when checking for domain memory.
>        A fix is also needed in dom0_construct_pv() to avoid automatically
>        adding PGC_extra pages to the physmap.

I am not entirely sure how this is related to this patch. Was it a 
latent bug? If so, I think it would make sense to split it from this patch.

> 
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> ---
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Julien Grall <julien@xen.org>
> Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Wei Liu <wl@xen.org>
> ---
>   xen/arch/arm/domain.c        |  2 ++
>   xen/arch/x86/domain.c        |  3 ++-
>   xen/arch/x86/mm/p2m.c        |  3 +--
>   xen/arch/x86/pv/dom0_build.c |  4 ++++
>   xen/common/domain.c          | 25 +++++++++++++++++++------
>   xen/common/event_2l.c        |  4 ++++
>   xen/common/event_fifo.c      |  1 +
>   xen/common/time.c            |  3 +++
>   8 files changed, 36 insertions(+), 9 deletions(-)
> 
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 2cbcdaac08..3904519256 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -1006,6 +1006,8 @@ int domain_relinquish_resources(struct domain *d)
>           BUG();
>       }
>   
> +    free_shared_info(d);
> +
>       return 0;
>   }
>   
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index eb7b0fc51c..3ad532eccf 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -691,7 +691,6 @@ void arch_domain_destroy(struct domain *d)
>           pv_domain_destroy(d);
>       free_perdomain_mappings(d);
>   
> -    free_shared_info(d);
>       cleanup_domain_irq_mapping(d);
>   
>       psr_domain_free(d);
> @@ -2246,6 +2245,8 @@ int domain_relinquish_resources(struct domain *d)
>       if ( is_hvm_domain(d) )
>           hvm_domain_relinquish_resources(d);
>   
> +    free_shared_info(d);
> +
>       return 0;
>   }
>   
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index c5f428d67c..787d97d85e 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -695,8 +695,7 @@ int p2m_alloc_table(struct p2m_domain *p2m)
>   
>       p2m_lock(p2m);
>   
> -    if ( p2m_is_hostp2m(p2m)
> -         && !page_list_empty(&d->page_list) )
> +    if ( p2m_is_hostp2m(p2m) && domain_tot_pages(d) )
>       {
>           P2M_ERROR("dom %d already has memory allocated\n", d->domain_id);
>           p2m_unlock(p2m);
> diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
> index dc16ef2e79..f8f1bbe2f4 100644
> --- a/xen/arch/x86/pv/dom0_build.c
> +++ b/xen/arch/x86/pv/dom0_build.c
> @@ -792,6 +792,10 @@ int __init dom0_construct_pv(struct domain *d,
>       {
>           mfn = mfn_x(page_to_mfn(page));
>           BUG_ON(SHARED_M2P(get_gpfn_from_mfn(mfn)));
> +
> +        if ( page->count_info & PGC_extra )
> +            continue;

I would add a comment explaining why we skip page with PGC_extra set.

> +
>           if ( get_gpfn_from_mfn(mfn) >= count )
>           {
>               BUG_ON(is_pv_32bit_domain(d));
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index ba7a905258..1d42fbcc0f 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -128,9 +128,9 @@ static void vcpu_info_reset(struct vcpu *v)
>   {
>       struct domain *d = v->domain;
>   
> -    v->vcpu_info = ((v->vcpu_id < XEN_LEGACY_MAX_VCPUS)
> +    v->vcpu_info = (!d->is_dying && v->vcpu_id < XEN_LEGACY_MAX_VCPUS)
>                       ? (vcpu_info_t *)&shared_info(d, vcpu_info[v->vcpu_id])
> -                    : &dummy_vcpu_info);
> +                    : &dummy_vcpu_info;

Without holding domain_lock(), I don't think there is a guarantee that 
is_dying (and therefore the shared_info) is not going to change behind 
your back. So v->vcpu_info may point to garbagge.

Looking at the callers, XEN_DOMCTL_soft_reset will not hold the lock.

As an aside, it would be good to explain in a comment why we are using 
dummy_vcpu_info when the domain is dying.

>       v->vcpu_info_mfn = INVALID_MFN;
>   }
>   
> @@ -1650,24 +1650,37 @@ int continue_hypercall_on_cpu(
>   
>   int alloc_shared_info(struct domain *d, unsigned int memflags)
>   {
> -    if ( (d->shared_info.virt = alloc_xenheap_pages(0, memflags)) == NULL )
> +    struct page_info *pg;
> +
> +    pg = alloc_domheap_page(d, MEMF_no_refcount | memflags);
> +    if ( !pg )
>           return -ENOMEM;
>   
> -    d->shared_info.mfn = virt_to_mfn(d->shared_info.virt);
> +    if ( !get_page_and_type(pg, d, PGT_writable_page) )
> +        return -ENODATA;

I think the page will never be freed if this fails.

> +
> +    d->shared_info.mfn = page_to_mfn(pg);
> +    d->shared_info.virt = __map_domain_page_global(pg);
>   
>       clear_page(d->shared_info.virt);
> -    share_xen_page_with_guest(mfn_to_page(d->shared_info.mfn), d, SHARE_rw);
>   
>       return 0;
>   }
>   
>   void free_shared_info(struct domain *d)
>   {
> +    struct page_info *pg;
> +
>       if ( !d->shared_info.virt )
>           return;
>   
> -    free_xenheap_page(d->shared_info.virt);
> +    unmap_domain_page_global(d->shared_info.virt);
>       d->shared_info.virt = NULL;
> +
> +    pg = mfn_to_page(d->shared_info.mfn);
> +
> +    put_page_alloc_ref(pg);
> +    put_page_and_type(pg);
>   }
>   
>   /*
> diff --git a/xen/common/event_2l.c b/xen/common/event_2l.c
> index e1dbb860f4..a72fe0232b 100644
> --- a/xen/common/event_2l.c
> +++ b/xen/common/event_2l.c
> @@ -27,6 +27,7 @@ static void evtchn_2l_set_pending(struct vcpu *v, struct evtchn *evtchn)
>        * others may require explicit memory barriers.
>        */
>   
> +    ASSERT(d->shared_info.virt);
>       if ( guest_test_and_set_bit(d, port, &shared_info(d, evtchn_pending)) )
>           return;
>   
> @@ -54,6 +55,7 @@ static void evtchn_2l_unmask(struct domain *d, struct evtchn *evtchn)
>        * These operations must happen in strict order. Based on
>        * evtchn_2l_set_pending() above.
>        */
> +    ASSERT(d->shared_info.virt);
>       if ( guest_test_and_clear_bit(d, port, &shared_info(d, evtchn_mask)) &&
>            guest_test_bit(d, port, &shared_info(d, evtchn_pending)) &&
>            !guest_test_and_set_bit(d, port / BITS_PER_EVTCHN_WORD(d),
> @@ -67,6 +69,7 @@ static bool evtchn_2l_is_pending(const struct domain *d, evtchn_port_t port)
>   {
>       unsigned int max_ports = BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d);
>   
> +    ASSERT(d->shared_info.virt);
>       ASSERT(port < max_ports);
>       return (port < max_ports &&
>               guest_test_bit(d, port, &shared_info(d, evtchn_pending)));
> @@ -76,6 +79,7 @@ static bool evtchn_2l_is_masked(const struct domain *d, evtchn_port_t port)
>   {
>       unsigned int max_ports = BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d);
>   
> +    ASSERT(d->shared_info.virt);
>       ASSERT(port < max_ports);
>       return (port >= max_ports ||
>               guest_test_bit(d, port, &shared_info(d, evtchn_mask)));
> diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c
> index 230f440f14..e8c6045d72 100644
> --- a/xen/common/event_fifo.c
> +++ b/xen/common/event_fifo.c
> @@ -497,6 +497,7 @@ static void setup_ports(struct domain *d)
>   
>           evtchn = evtchn_from_port(d, port);
>   
> +        ASSERT(d->shared_info.virt);
>           if ( guest_test_bit(d, port, &shared_info(d, evtchn_pending)) )
>               evtchn->pending = 1;
>   
> diff --git a/xen/common/time.c b/xen/common/time.c
> index 58fa9abc40..938226c7b1 100644
> --- a/xen/common/time.c
> +++ b/xen/common/time.c
> @@ -99,6 +99,9 @@ void update_domain_wallclock_time(struct domain *d)
>       uint32_t *wc_version;
>       uint64_t sec;
>   
> +    if ( d->is_dying )
> +        return;

This is another instance where I think the use of d->is_dying is not 
safe. I looked at how other places in Xen dealt with d->is_dying.

We don't seem to have a common way to deal with it:
    1) It may be checked under domain_lock() -> No issue with that
    2) It may be checked under d->page_alloc_lock (e.g assign_pages()). 
The assign_pages() case is fine because it will act as a full barrier. 
So if we call happen after relinquish_memory() then we will surely have 
observed d->is_dying. I haven't checked the others.

Some of the instances user neither the 2 locks above. We probably ought 
to investigate them (I will add a TODO in my list).

Regarding the two cases here, domain_lock() might be the solution. If we 
are concern about the contention (it is a spinlock), then we could 
switch the domain_lock() from spinlock to rwlock.

Cheers,
Jan Beulich Feb. 26, 2020, 11:43 a.m. UTC | #2
On 26.02.2020 12:33, Julien Grall wrote:
> On 25/02/2020 09:53, Paul Durrant wrote:
>> --- a/xen/common/time.c
>> +++ b/xen/common/time.c
>> @@ -99,6 +99,9 @@ void update_domain_wallclock_time(struct domain *d)
>>       uint32_t *wc_version;
>>       uint64_t sec;
>>   
>> +    if ( d->is_dying )
>> +        return;
> 
> This is another instance where I think the use of d->is_dying is not 
> safe. I looked at how other places in Xen dealt with d->is_dying.
> 
> We don't seem to have a common way to deal with it:
>     1) It may be checked under domain_lock() -> No issue with that
>     2) It may be checked under d->page_alloc_lock (e.g assign_pages()). 
> The assign_pages() case is fine because it will act as a full barrier. 
> So if we call happen after relinquish_memory() then we will surely have 
> observed d->is_dying. I haven't checked the others.
> 
> Some of the instances user neither the 2 locks above. We probably ought 
> to investigate them (I will add a TODO in my list).
> 
> Regarding the two cases here, domain_lock() might be the solution. If we 
> are concern about the contention (it is a spinlock), then we could 
> switch the domain_lock() from spinlock to rwlock.

That's an option, yes, but even better would be to avoid spreading
use of this lock (we did try to remove as many of the uses as
possible several years ago). For d->is_dying I think it is a case-
by-case justification of why a use is safe (the main point, after
all being that whatever code comes after the check must work
correctly if on some other CPU the flag is about to be / being set.

Jan
Andrew Cooper Feb. 26, 2020, 11:46 a.m. UTC | #3
On 25/02/2020 09:53, Paul Durrant wrote:
> There's no particular reason shared_info need use a xenheap page.

?

There are a number of ABI-critical reasons, not least the evtchn_pending
field which Xen needs to write.

I can see what you're trying to do, and it looks fine in principle, but
the commit message isn't remotely accurate.  Remember that you are in
the process of changing the meaning/usage of the xenheap - preexiting
uses conform to the old meaning, where the sole distinction between
domheap and xenheap pages was whether Xen required a virtual mapping or
not.  The answer is "absolutely yes" for the shared info page.

~Andrew
Durrant, Paul Feb. 26, 2020, 11:53 a.m. UTC | #4
> -----Original Message-----
> From: Andrew Cooper <andrew.cooper3@citrix.com>
> Sent: 26 February 2020 11:46
> To: Durrant, Paul <pdurrant@amazon.co.uk>; xen-devel@lists.xenproject.org
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Julien Grall
> <julien@xen.org>; Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>; George
> Dunlap <george.dunlap@citrix.com>; Ian Jackson
> <ian.jackson@eu.citrix.com>; Jan Beulich <jbeulich@suse.com>; Konrad
> Rzeszutek Wilk <konrad.wilk@oracle.com>; Wei Liu <wl@xen.org>
> Subject: Re: [PATCH 2/2] domain: use PGC_extra domheap page for
> shared_info
> 
> On 25/02/2020 09:53, Paul Durrant wrote:
> > There's no particular reason shared_info need use a xenheap page.
> 

Ok, 'particular' is too vague, agreed. I'll elaborate.

> ?
> 
> There are a number of ABI-critical reasons, not least the evtchn_pending
> field which Xen needs to write.
> 

I do address this specifically in the commit comment.

"ASSERTions are added before apparently vulnerable dereferences in the
event channel code. These should not be hit because domain_kill() calls
evtchn_destroy() before calling domain_relinquish_resources() but are
added to catch any future re-ordering."

> I can see what you're trying to do, and it looks fine in principle, but
> the commit message isn't remotely accurate.  Remember that you are in
> the process of changing the meaning/usage of the xenheap - preexiting
> uses conform to the old meaning, where the sole distinction between
> domheap and xenheap pages was whether Xen required a virtual mapping or
> not.  The answer is "absolutely yes" for the shared info page.
> 

Was that the case? I haven't mined to find when map_domain_page() was introduced. At that point, of course, any strict 'being virtually mapped' distinction between xenheap and domheap was rendered inaccurate.

  Paul
Durrant, Paul Feb. 26, 2020, 12:03 p.m. UTC | #5
> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 26 February 2020 11:33
> To: Durrant, Paul <pdurrant@amazon.co.uk>; xen-devel@lists.xenproject.org
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Volodymyr Babchuk
> <Volodymyr_Babchuk@epam.com>; Andrew Cooper <andrew.cooper3@citrix.com>;
> George Dunlap <george.dunlap@citrix.com>; Ian Jackson
> <ian.jackson@eu.citrix.com>; Jan Beulich <jbeulich@suse.com>; Konrad
> Rzeszutek Wilk <konrad.wilk@oracle.com>; Wei Liu <wl@xen.org>
> Subject: Re: [PATCH 2/2] domain: use PGC_extra domheap page for
> shared_info
> 
> Hi Paul,
> 
> On 25/02/2020 09:53, Paul Durrant wrote:
> > There's no particular reason shared_info need use a xenheap page.
> 
> AFAICT, a side-effect of this change is the shared_info is now going to
> be part of the migration stream. One issue with this is on the restore
> side, they will be accounted in number of pages allocated to the domain.
> 
> I am fairly certain dirty logging on page with PGC_extra set would not
> work properly as well.

True, those two do need fixing before expanding the use of PGC_extra. I guess this may already be a problem with the current VMX use case.

> 
> As the pages will be recreated in the restore side, I would suggest to
> skip them in XEN_DOMCTL_getpageframeinfo3.
>

At some point in future I guess it may be useful to migrate PGC_extra pages, but they would need to be marked as such in the stream. Skipping sounds like the right approach for now.

> > It's
> > only purpose is to be mapped by the guest so use a PGC_extra domheap
> page
> > instead. This does entail freeing shared_info during
> > domain_relinquish_resources() rather than domain_destroy() so care is
> > needed to avoid de-referencing a NULL shared_info pointer hence some
> > extra checks of 'is_dying' are needed.
> > ASSERTions are added before apparently vulnerable dereferences in the
> > event channel code. These should not be hit because domain_kill() calls
> > evtchn_destroy() before calling domain_relinquish_resources() but are
> > added to catch any future re-ordering.
> 
> IHMO, the ASSERTions in the event channel pending/mask/... helpers will
> not protect against re-ordering. You may never hit them even if there is
> a re-ordering. It would be better to add an ASSERT()/BUG_ON() in
> evtchn_destroy() and possibly a comment in domain_kill() to mention
> ordering.

I'm not sure about that. Why would they not be hit?

> 
> > For Arm, the call to free_shared_info() in arch_domain_destroy() is left
> > in place since it called in the error path for arch_domain_create().
> >
> > NOTE: A modification in p2m_alloc_table() is required to avoid a false
> >        positive when checking for domain memory.
> >        A fix is also needed in dom0_construct_pv() to avoid
> automatically
> >        adding PGC_extra pages to the physmap.
> 
> I am not entirely sure how this is related to this patch. Was it a
> latent bug? If so, I think it would make sense to split it from this
> patch.
> 

It's related because the check relies on the fact that no PGC_extra pages are created before it is called. This is indeed true until this patch is applied.

> >
> > Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> > ---
> > Cc: Stefano Stabellini <sstabellini@kernel.org>
> > Cc: Julien Grall <julien@xen.org>
> > Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> > Cc: George Dunlap <george.dunlap@citrix.com>
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Jan Beulich <jbeulich@suse.com>
> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Cc: Wei Liu <wl@xen.org>
> > ---
> >   xen/arch/arm/domain.c        |  2 ++
> >   xen/arch/x86/domain.c        |  3 ++-
> >   xen/arch/x86/mm/p2m.c        |  3 +--
> >   xen/arch/x86/pv/dom0_build.c |  4 ++++
> >   xen/common/domain.c          | 25 +++++++++++++++++++------
> >   xen/common/event_2l.c        |  4 ++++
> >   xen/common/event_fifo.c      |  1 +
> >   xen/common/time.c            |  3 +++
> >   8 files changed, 36 insertions(+), 9 deletions(-)
> >
> > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> > index 2cbcdaac08..3904519256 100644
> > --- a/xen/arch/arm/domain.c
> > +++ b/xen/arch/arm/domain.c
> > @@ -1006,6 +1006,8 @@ int domain_relinquish_resources(struct domain *d)
> >           BUG();
> >       }
> >
> > +    free_shared_info(d);
> > +
> >       return 0;
> >   }
> >
> > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> > index eb7b0fc51c..3ad532eccf 100644
> > --- a/xen/arch/x86/domain.c
> > +++ b/xen/arch/x86/domain.c
> > @@ -691,7 +691,6 @@ void arch_domain_destroy(struct domain *d)
> >           pv_domain_destroy(d);
> >       free_perdomain_mappings(d);
> >
> > -    free_shared_info(d);
> >       cleanup_domain_irq_mapping(d);
> >
> >       psr_domain_free(d);
> > @@ -2246,6 +2245,8 @@ int domain_relinquish_resources(struct domain *d)
> >       if ( is_hvm_domain(d) )
> >           hvm_domain_relinquish_resources(d);
> >
> > +    free_shared_info(d);
> > +
> >       return 0;
> >   }
> >
> > diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> > index c5f428d67c..787d97d85e 100644
> > --- a/xen/arch/x86/mm/p2m.c
> > +++ b/xen/arch/x86/mm/p2m.c
> > @@ -695,8 +695,7 @@ int p2m_alloc_table(struct p2m_domain *p2m)
> >
> >       p2m_lock(p2m);
> >
> > -    if ( p2m_is_hostp2m(p2m)
> > -         && !page_list_empty(&d->page_list) )
> > +    if ( p2m_is_hostp2m(p2m) && domain_tot_pages(d) )
> >       {
> >           P2M_ERROR("dom %d already has memory allocated\n", d-
> >domain_id);
> >           p2m_unlock(p2m);
> > diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
> > index dc16ef2e79..f8f1bbe2f4 100644
> > --- a/xen/arch/x86/pv/dom0_build.c
> > +++ b/xen/arch/x86/pv/dom0_build.c
> > @@ -792,6 +792,10 @@ int __init dom0_construct_pv(struct domain *d,
> >       {
> >           mfn = mfn_x(page_to_mfn(page));
> >           BUG_ON(SHARED_M2P(get_gpfn_from_mfn(mfn)));
> > +
> > +        if ( page->count_info & PGC_extra )
> > +            continue;
> 
> I would add a comment explaining why we skip page with PGC_extra set.
> 
> > +
> >           if ( get_gpfn_from_mfn(mfn) >= count )
> >           {
> >               BUG_ON(is_pv_32bit_domain(d));
> > diff --git a/xen/common/domain.c b/xen/common/domain.c
> > index ba7a905258..1d42fbcc0f 100644
> > --- a/xen/common/domain.c
> > +++ b/xen/common/domain.c
> > @@ -128,9 +128,9 @@ static void vcpu_info_reset(struct vcpu *v)
> >   {
> >       struct domain *d = v->domain;
> >
> > -    v->vcpu_info = ((v->vcpu_id < XEN_LEGACY_MAX_VCPUS)
> > +    v->vcpu_info = (!d->is_dying && v->vcpu_id < XEN_LEGACY_MAX_VCPUS)
> >                       ? (vcpu_info_t *)&shared_info(d, vcpu_info[v-
> >vcpu_id])
> > -                    : &dummy_vcpu_info);
> > +                    : &dummy_vcpu_info;
> 
> Without holding domain_lock(), I don't think there is a guarantee that
> is_dying (and therefore the shared_info) is not going to change behind
> your back. So v->vcpu_info may point to garbagge.
> 
> Looking at the callers, XEN_DOMCTL_soft_reset will not hold the lock.
> 

True, it does need locking in some way.

> As an aside, it would be good to explain in a comment why we are using
> dummy_vcpu_info when the domain is dying.
> 

Simply because it is guaranteed to exist, but I'll add a comment to that effect.

> >       v->vcpu_info_mfn = INVALID_MFN;
> >   }
> >
> > @@ -1650,24 +1650,37 @@ int continue_hypercall_on_cpu(
> >
> >   int alloc_shared_info(struct domain *d, unsigned int memflags)
> >   {
> > -    if ( (d->shared_info.virt = alloc_xenheap_pages(0, memflags)) ==
> NULL )
> > +    struct page_info *pg;
> > +
> > +    pg = alloc_domheap_page(d, MEMF_no_refcount | memflags);
> > +    if ( !pg )
> >           return -ENOMEM;
> >
> > -    d->shared_info.mfn = virt_to_mfn(d->shared_info.virt);
> > +    if ( !get_page_and_type(pg, d, PGT_writable_page) )
> > +        return -ENODATA;
> 
> I think the page will never be freed if this fails.
>

Good spot.

> > +
> > +    d->shared_info.mfn = page_to_mfn(pg);
> > +    d->shared_info.virt = __map_domain_page_global(pg);
> >
> >       clear_page(d->shared_info.virt);
> > -    share_xen_page_with_guest(mfn_to_page(d->shared_info.mfn), d,
> SHARE_rw);
> >
> >       return 0;
> >   }
> >
> >   void free_shared_info(struct domain *d)
> >   {
> > +    struct page_info *pg;
> > +
> >       if ( !d->shared_info.virt )
> >           return;
> >
> > -    free_xenheap_page(d->shared_info.virt);
> > +    unmap_domain_page_global(d->shared_info.virt);
> >       d->shared_info.virt = NULL;
> > +
> > +    pg = mfn_to_page(d->shared_info.mfn);
> > +
> > +    put_page_alloc_ref(pg);
> > +    put_page_and_type(pg);
> >   }
> >
> >   /*
> > diff --git a/xen/common/event_2l.c b/xen/common/event_2l.c
> > index e1dbb860f4..a72fe0232b 100644
> > --- a/xen/common/event_2l.c
> > +++ b/xen/common/event_2l.c
> > @@ -27,6 +27,7 @@ static void evtchn_2l_set_pending(struct vcpu *v,
> struct evtchn *evtchn)
> >        * others may require explicit memory barriers.
> >        */
> >
> > +    ASSERT(d->shared_info.virt);
> >       if ( guest_test_and_set_bit(d, port, &shared_info(d,
> evtchn_pending)) )
> >           return;
> >
> > @@ -54,6 +55,7 @@ static void evtchn_2l_unmask(struct domain *d, struct
> evtchn *evtchn)
> >        * These operations must happen in strict order. Based on
> >        * evtchn_2l_set_pending() above.
> >        */
> > +    ASSERT(d->shared_info.virt);
> >       if ( guest_test_and_clear_bit(d, port, &shared_info(d,
> evtchn_mask)) &&
> >            guest_test_bit(d, port, &shared_info(d, evtchn_pending)) &&
> >            !guest_test_and_set_bit(d, port / BITS_PER_EVTCHN_WORD(d),
> > @@ -67,6 +69,7 @@ static bool evtchn_2l_is_pending(const struct domain
> *d, evtchn_port_t port)
> >   {
> >       unsigned int max_ports = BITS_PER_EVTCHN_WORD(d) *
> BITS_PER_EVTCHN_WORD(d);
> >
> > +    ASSERT(d->shared_info.virt);
> >       ASSERT(port < max_ports);
> >       return (port < max_ports &&
> >               guest_test_bit(d, port, &shared_info(d, evtchn_pending)));
> > @@ -76,6 +79,7 @@ static bool evtchn_2l_is_masked(const struct domain
> *d, evtchn_port_t port)
> >   {
> >       unsigned int max_ports = BITS_PER_EVTCHN_WORD(d) *
> BITS_PER_EVTCHN_WORD(d);
> >
> > +    ASSERT(d->shared_info.virt);
> >       ASSERT(port < max_ports);
> >       return (port >= max_ports ||
> >               guest_test_bit(d, port, &shared_info(d, evtchn_mask)));
> > diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c
> > index 230f440f14..e8c6045d72 100644
> > --- a/xen/common/event_fifo.c
> > +++ b/xen/common/event_fifo.c
> > @@ -497,6 +497,7 @@ static void setup_ports(struct domain *d)
> >
> >           evtchn = evtchn_from_port(d, port);
> >
> > +        ASSERT(d->shared_info.virt);
> >           if ( guest_test_bit(d, port, &shared_info(d, evtchn_pending))
> )
> >               evtchn->pending = 1;
> >
> > diff --git a/xen/common/time.c b/xen/common/time.c
> > index 58fa9abc40..938226c7b1 100644
> > --- a/xen/common/time.c
> > +++ b/xen/common/time.c
> > @@ -99,6 +99,9 @@ void update_domain_wallclock_time(struct domain *d)
> >       uint32_t *wc_version;
> >       uint64_t sec;
> >
> > +    if ( d->is_dying )
> > +        return;
> 
> This is another instance where I think the use of d->is_dying is not
> safe. I looked at how other places in Xen dealt with d->is_dying.
> 
> We don't seem to have a common way to deal with it:
>     1) It may be checked under domain_lock() -> No issue with that
>     2) It may be checked under d->page_alloc_lock (e.g assign_pages()).
> The assign_pages() case is fine because it will act as a full barrier.
> So if we call happen after relinquish_memory() then we will surely have
> observed d->is_dying. I haven't checked the others.
> 
> Some of the instances user neither the 2 locks above. We probably ought
> to investigate them (I will add a TODO in my list).
> 
> Regarding the two cases here, domain_lock() might be the solution. If we
> are concern about the contention (it is a spinlock), then we could
> switch the domain_lock() from spinlock to rwlock.
> 

I'll see if there is any other suitable lock guaranteed to be taken during domain_kill() but it may indeed have to be domain_lock().

  Paul

> Cheers,
> 
> --
> Julien Grall
Jan Beulich Feb. 26, 2020, 1:57 p.m. UTC | #6
On 25.02.2020 10:53, Paul Durrant wrote:
> There's no particular reason shared_info need use a xenheap page. It's
> only purpose is to be mapped by the guest so use a PGC_extra domheap page
> instead.

Since the cover letter also doesn't give any background - is there a
problem with the current arrangements? Are there any further plans
depending on this being changed? Or is this simply "let's do it
because now we can"?

Jan
Durrant, Paul Feb. 26, 2020, 2:05 p.m. UTC | #7
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 26 February 2020 13:58
> To: Durrant, Paul <pdurrant@amazon.co.uk>
> Cc: xen-devel@lists.xenproject.org; Stefano Stabellini
> <sstabellini@kernel.org>; Julien Grall <julien@xen.org>; Volodymyr Babchuk
> <Volodymyr_Babchuk@epam.com>; Andrew Cooper <andrew.cooper3@citrix.com>;
> George Dunlap <george.dunlap@citrix.com>; Ian Jackson
> <ian.jackson@eu.citrix.com>; Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com>; Wei Liu <wl@xen.org>
> Subject: Re: [PATCH 2/2] domain: use PGC_extra domheap page for
> shared_info
> 
> On 25.02.2020 10:53, Paul Durrant wrote:
> > There's no particular reason shared_info need use a xenheap page. It's
> > only purpose is to be mapped by the guest so use a PGC_extra domheap
> page
> > instead.
> 
> Since the cover letter also doesn't give any background - is there a
> problem with the current arrangements? Are there any further plans
> depending on this being changed? Or is this simply "let's do it
> because now we can"?
> 

The general direction is to get rid of shared xenheap pages. Knowing that a xenheap page is not shared with a guest makes dealing with live update much easier, and also allows a chunk of code to be removed from domain_relinquish_resoureses() (the shared xenheap walk which de-assigns them from the dying guest).
The only xenheap pages shared with a normal domU (as opposed to a system domain, which would be re-created on live update anyway) are AFAICT shared-info and grant table/status frames. This series deals with shared-info but I do have proto-patches for the grant table code too.

  Paul
Julien Grall Feb. 26, 2020, 2:22 p.m. UTC | #8
On 26/02/2020 12:03, Durrant, Paul wrote:
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 26 February 2020 11:33
>> To: Durrant, Paul <pdurrant@amazon.co.uk>; xen-devel@lists.xenproject.org
>> Cc: Stefano Stabellini <sstabellini@kernel.org>; Volodymyr Babchuk
>> <Volodymyr_Babchuk@epam.com>; Andrew Cooper <andrew.cooper3@citrix.com>;
>> George Dunlap <george.dunlap@citrix.com>; Ian Jackson
>> <ian.jackson@eu.citrix.com>; Jan Beulich <jbeulich@suse.com>; Konrad
>> Rzeszutek Wilk <konrad.wilk@oracle.com>; Wei Liu <wl@xen.org>
>> Subject: Re: [PATCH 2/2] domain: use PGC_extra domheap page for
>> shared_info
>>
>> Hi Paul,
>>
>> On 25/02/2020 09:53, Paul Durrant wrote:
>>> There's no particular reason shared_info need use a xenheap page.
>>
>> AFAICT, a side-effect of this change is the shared_info is now going to
>> be part of the migration stream. One issue with this is on the restore
>> side, they will be accounted in number of pages allocated to the domain.
>>
>> I am fairly certain dirty logging on page with PGC_extra set would not
>> work properly as well.
> 
> True, those two do need fixing before expanding the use of PGC_extra. I guess this may already be a problem with the current VMX use case.
> 
>>
>> As the pages will be recreated in the restore side, I would suggest to
>> skip them in XEN_DOMCTL_getpageframeinfo3.
>>
> 
> At some point in future I guess it may be useful to migrate PGC_extra pages, but they would need to be marked as such in the stream. Skipping sounds like the right approach for now.
> 
>>> It's
>>> only purpose is to be mapped by the guest so use a PGC_extra domheap
>> page
>>> instead. This does entail freeing shared_info during
>>> domain_relinquish_resources() rather than domain_destroy() so care is
>>> needed to avoid de-referencing a NULL shared_info pointer hence some
>>> extra checks of 'is_dying' are needed.
>>> ASSERTions are added before apparently vulnerable dereferences in the
>>> event channel code. These should not be hit because domain_kill() calls
>>> evtchn_destroy() before calling domain_relinquish_resources() but are
>>> added to catch any future re-ordering.
>>
>> IHMO, the ASSERTions in the event channel pending/mask/... helpers will
>> not protect against re-ordering. You may never hit them even if there is
>> a re-ordering. It would be better to add an ASSERT()/BUG_ON() in
>> evtchn_destroy() and possibly a comment in domain_kill() to mention
>> ordering.
> 
> I'm not sure about that. Why would they not be hit?

A developper may never touch the event channels after during the domain 
destruction in debug build. So the re-ordering would become unnoticed.

This would become quite a problem if the production build end up to 
touch event channels during the domain destruction.

> 
>>
>>> For Arm, the call to free_shared_info() in arch_domain_destroy() is left
>>> in place since it called in the error path for arch_domain_create().
>>>
>>> NOTE: A modification in p2m_alloc_table() is required to avoid a false
>>>         positive when checking for domain memory.
>>>         A fix is also needed in dom0_construct_pv() to avoid
>> automatically
>>>         adding PGC_extra pages to the physmap.
>>
>> I am not entirely sure how this is related to this patch. Was it a
>> latent bug? If so, I think it would make sense to split it from this
>> patch.
>>
> 
> It's related because the check relies on the fact that no PGC_extra pages are created before it is called. This is indeed true until this patch is applied.

I would prefer if they are fixed in separate patches as this pach 
already quite a lot. But I will leave that up to Andrew and Jan.

>>>
>>> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
>>> ---
>>> Cc: Stefano Stabellini <sstabellini@kernel.org>
>>> Cc: Julien Grall <julien@xen.org>
>>> Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Cc: George Dunlap <george.dunlap@citrix.com>
>>> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>>> Cc: Jan Beulich <jbeulich@suse.com>
>>> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>>> Cc: Wei Liu <wl@xen.org>
>>> ---
>>>    xen/arch/arm/domain.c        |  2 ++
>>>    xen/arch/x86/domain.c        |  3 ++-
>>>    xen/arch/x86/mm/p2m.c        |  3 +--
>>>    xen/arch/x86/pv/dom0_build.c |  4 ++++
>>>    xen/common/domain.c          | 25 +++++++++++++++++++------
>>>    xen/common/event_2l.c        |  4 ++++
>>>    xen/common/event_fifo.c      |  1 +
>>>    xen/common/time.c            |  3 +++
>>>    8 files changed, 36 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
>>> index 2cbcdaac08..3904519256 100644
>>> --- a/xen/arch/arm/domain.c
>>> +++ b/xen/arch/arm/domain.c
>>> @@ -1006,6 +1006,8 @@ int domain_relinquish_resources(struct domain *d)
>>>            BUG();
>>>        }
>>>
>>> +    free_shared_info(d);
>>> +
>>>        return 0;
>>>    }
>>>
>>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>>> index eb7b0fc51c..3ad532eccf 100644
>>> --- a/xen/arch/x86/domain.c
>>> +++ b/xen/arch/x86/domain.c
>>> @@ -691,7 +691,6 @@ void arch_domain_destroy(struct domain *d)
>>>            pv_domain_destroy(d);
>>>        free_perdomain_mappings(d);
>>>
>>> -    free_shared_info(d);
>>>        cleanup_domain_irq_mapping(d);
>>>
>>>        psr_domain_free(d);
>>> @@ -2246,6 +2245,8 @@ int domain_relinquish_resources(struct domain *d)
>>>        if ( is_hvm_domain(d) )
>>>            hvm_domain_relinquish_resources(d);
>>>
>>> +    free_shared_info(d);
>>> +
>>>        return 0;
>>>    }
>>>
>>> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
>>> index c5f428d67c..787d97d85e 100644
>>> --- a/xen/arch/x86/mm/p2m.c
>>> +++ b/xen/arch/x86/mm/p2m.c
>>> @@ -695,8 +695,7 @@ int p2m_alloc_table(struct p2m_domain *p2m)
>>>
>>>        p2m_lock(p2m);
>>>
>>> -    if ( p2m_is_hostp2m(p2m)
>>> -         && !page_list_empty(&d->page_list) )
>>> +    if ( p2m_is_hostp2m(p2m) && domain_tot_pages(d) )
>>>        {
>>>            P2M_ERROR("dom %d already has memory allocated\n", d-
>>> domain_id);
>>>            p2m_unlock(p2m);
>>> diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
>>> index dc16ef2e79..f8f1bbe2f4 100644
>>> --- a/xen/arch/x86/pv/dom0_build.c
>>> +++ b/xen/arch/x86/pv/dom0_build.c
>>> @@ -792,6 +792,10 @@ int __init dom0_construct_pv(struct domain *d,
>>>        {
>>>            mfn = mfn_x(page_to_mfn(page));
>>>            BUG_ON(SHARED_M2P(get_gpfn_from_mfn(mfn)));
>>> +
>>> +        if ( page->count_info & PGC_extra )
>>> +            continue;
>>
>> I would add a comment explaining why we skip page with PGC_extra set.
>>
>>> +
>>>            if ( get_gpfn_from_mfn(mfn) >= count )
>>>            {
>>>                BUG_ON(is_pv_32bit_domain(d));
>>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>>> index ba7a905258..1d42fbcc0f 100644
>>> --- a/xen/common/domain.c
>>> +++ b/xen/common/domain.c
>>> @@ -128,9 +128,9 @@ static void vcpu_info_reset(struct vcpu *v)
>>>    {
>>>        struct domain *d = v->domain;
>>>
>>> -    v->vcpu_info = ((v->vcpu_id < XEN_LEGACY_MAX_VCPUS)
>>> +    v->vcpu_info = (!d->is_dying && v->vcpu_id < XEN_LEGACY_MAX_VCPUS)
>>>                        ? (vcpu_info_t *)&shared_info(d, vcpu_info[v-
>>> vcpu_id])
>>> -                    : &dummy_vcpu_info);
>>> +                    : &dummy_vcpu_info;
>>
>> Without holding domain_lock(), I don't think there is a guarantee that
>> is_dying (and therefore the shared_info) is not going to change behind
>> your back. So v->vcpu_info may point to garbagge.
>>
>> Looking at the callers, XEN_DOMCTL_soft_reset will not hold the lock.
>>
> 
> True, it does need locking in some way.
> 
>> As an aside, it would be good to explain in a comment why we are using
>> dummy_vcpu_info when the domain is dying.
>>
> 
> Simply because it is guaranteed to exist, but I'll add a comment to that effect.

Well, the question is what will happen if you end up to access it. Could 
you inadvertently write from domain A vCPU A and then read from domain B 
vCPU B?

Such problem is already present technically. But I would be fairly 
concerned if that's ever possible during domain destroy. So what are we 
trying to protect against here? Is it just code hardening?

[...]

Cheers,
Jan Beulich Feb. 26, 2020, 2:44 p.m. UTC | #9
On 26.02.2020 15:22, Julien Grall wrote:
> On 26/02/2020 12:03, Durrant, Paul wrote:
>>> From: Julien Grall <julien@xen.org>
>>> Sent: 26 February 2020 11:33
>>>
>>> On 25/02/2020 09:53, Paul Durrant wrote:
>>>> For Arm, the call to free_shared_info() in arch_domain_destroy() is left
>>>> in place since it called in the error path for arch_domain_create().
>>>>
>>>> NOTE: A modification in p2m_alloc_table() is required to avoid a false
>>>>         positive when checking for domain memory.
>>>>         A fix is also needed in dom0_construct_pv() to avoid
>>> automatically
>>>>         adding PGC_extra pages to the physmap.
>>>
>>> I am not entirely sure how this is related to this patch. Was it a
>>> latent bug? If so, I think it would make sense to split it from this
>>> patch.
>>>
>>
>> It's related because the check relies on the fact that no PGC_extra pages are created before it is called. This is indeed true until this patch is applied.
> 
> I would prefer if they are fixed in separate patches as this pach 
> already quite a lot. But I will leave that up to Andrew and Jan.

I agree with Julien (if splitting is sensibly possible), fwiw.

Jan
Jan Beulich Feb. 26, 2020, 3:23 p.m. UTC | #10
On 26.02.2020 15:05, Durrant, Paul wrote:
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 26 February 2020 13:58
>>
>> On 25.02.2020 10:53, Paul Durrant wrote:
>>> There's no particular reason shared_info need use a xenheap page. It's
>>> only purpose is to be mapped by the guest so use a PGC_extra domheap
>> page
>>> instead.
>>
>> Since the cover letter also doesn't give any background - is there a
>> problem with the current arrangements? Are there any further plans
>> depending on this being changed? Or is this simply "let's do it
>> because now we can"?
>>
> 
> The general direction is to get rid of shared xenheap pages. Knowing
> that a xenheap page is not shared with a guest makes dealing with
> live update much easier,

I may not be seeing enough of the overall picture, but it would seem
to me that the special treatment of shared Xen heap pages would then
be replaced by special treatment of PGC_extra ones.

Jan
Julien Grall Feb. 26, 2020, 4:12 p.m. UTC | #11
(+David)

On 26/02/2020 15:23, Jan Beulich wrote:
> On 26.02.2020 15:05, Durrant, Paul wrote:
>>> From: Jan Beulich <jbeulich@suse.com>
>>> Sent: 26 February 2020 13:58
>>>
>>> On 25.02.2020 10:53, Paul Durrant wrote:
>>>> There's no particular reason shared_info need use a xenheap page. It's
>>>> only purpose is to be mapped by the guest so use a PGC_extra domheap
>>> page
>>>> instead.
>>>
>>> Since the cover letter also doesn't give any background - is there a
>>> problem with the current arrangements? Are there any further plans
>>> depending on this being changed? Or is this simply "let's do it
>>> because now we can"?
>>>
>>
>> The general direction is to get rid of shared xenheap pages. Knowing
>> that a xenheap page is not shared with a guest makes dealing with
>> live update much easier,
> 
> I may not be seeing enough of the overall picture, but it would seem
> to me that the special treatment of shared Xen heap pages would then
> be replaced by special treatment of PGC_extra ones.

I have been working on Liveupdate for the past couple months and I don't 
  really see how this is going to make liveupdate easier. We will still 
need to save the extra flags and extra records for each subsystem using 
them (e.g grant-tables).

I have CCed David to see if he has a different opinion.

Cheers,
Woodhouse, David Feb. 26, 2020, 4:53 p.m. UTC | #12
On Wed, 2020-02-26 at 16:12 +0000, Julien Grall wrote:
> (+David)
> 
> On 26/02/2020 15:23, Jan Beulich wrote:
> > On 26.02.2020 15:05, Durrant, Paul wrote:
> > > > From: Jan Beulich <jbeulich@suse.com>
> > > > Sent: 26 February 2020 13:58
> > > > 
> > > > On 25.02.2020 10:53, Paul Durrant wrote:
> > > > > There's no particular reason shared_info need use a xenheap page. It's
> > > > > only purpose is to be mapped by the guest so use a PGC_extra domheap
> > > > 
> > > > page
> > > > > instead.
> > > > 
> > > > Since the cover letter also doesn't give any background - is there a
> > > > problem with the current arrangements? Are there any further plans
> > > > depending on this being changed? Or is this simply "let's do it
> > > > because now we can"?
> > > > 
> > > 
> > > The general direction is to get rid of shared xenheap pages. Knowing
> > > that a xenheap page is not shared with a guest makes dealing with
> > > live update much easier,
> > 
> > I may not be seeing enough of the overall picture, but it would seem
> > to me that the special treatment of shared Xen heap pages would then
> > be replaced by special treatment of PGC_extra ones.
> 
> I have been working on Liveupdate for the past couple months and I don't 
>   really see how this is going to make liveupdate easier. We will still 
> need to save the extra flags and extra records for each subsystem using 
> them (e.g grant-tables).
> 
> I have CCed David to see if he has a different opinion.

For live update we need to give a region of memory to the new Xen which
it can use for its boot allocator, before it's handled any of the live
update records and before it knows which *other* memory is still
available for use.

In order to do that, the original Xen has to ensure that it *doesn't*
use any of that memory region for domain-owned pages which would need
to be preserved.

So far in the patches I've posted upstream I have cheated, and simply
*not* added them to the main heap. Anything allocated before
end_boot_allocator() is fine because it is "ephemeral" to the first Xen
and doesn't need to be preserved (it's mostly frame tables and a few
PTE pages).

Paul's work is making it possible to use those pages as xenheap pages,
safe in the knowledge that they *won't* end up being mapped to domains,
and won't need to be preserved across live update.





Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.
Jan Beulich March 6, 2020, 11:37 a.m. UTC | #13
On 26.02.2020 17:53, Woodhouse, David wrote:
> On Wed, 2020-02-26 at 16:12 +0000, Julien Grall wrote:
>> (+David)
>>
>> On 26/02/2020 15:23, Jan Beulich wrote:
>>> On 26.02.2020 15:05, Durrant, Paul wrote:
>>>>> From: Jan Beulich <jbeulich@suse.com>
>>>>> Sent: 26 February 2020 13:58
>>>>>
>>>>> On 25.02.2020 10:53, Paul Durrant wrote:
>>>>>> There's no particular reason shared_info need use a xenheap page. It's
>>>>>> only purpose is to be mapped by the guest so use a PGC_extra domheap
>>>>>
>>>>> page
>>>>>> instead.
>>>>>
>>>>> Since the cover letter also doesn't give any background - is there a
>>>>> problem with the current arrangements? Are there any further plans
>>>>> depending on this being changed? Or is this simply "let's do it
>>>>> because now we can"?
>>>>>
>>>>
>>>> The general direction is to get rid of shared xenheap pages. Knowing
>>>> that a xenheap page is not shared with a guest makes dealing with
>>>> live update much easier,
>>>
>>> I may not be seeing enough of the overall picture, but it would seem
>>> to me that the special treatment of shared Xen heap pages would then
>>> be replaced by special treatment of PGC_extra ones.
>>
>> I have been working on Liveupdate for the past couple months and I don't 
>>   really see how this is going to make liveupdate easier. We will still 
>> need to save the extra flags and extra records for each subsystem using 
>> them (e.g grant-tables).
>>
>> I have CCed David to see if he has a different opinion.
> 
> For live update we need to give a region of memory to the new Xen which
> it can use for its boot allocator, before it's handled any of the live
> update records and before it knows which *other* memory is still
> available for use.
> 
> In order to do that, the original Xen has to ensure that it *doesn't*
> use any of that memory region for domain-owned pages which would need
> to be preserved.
> 
> So far in the patches I've posted upstream I have cheated, and simply
> *not* added them to the main heap. Anything allocated before
> end_boot_allocator() is fine because it is "ephemeral" to the first Xen
> and doesn't need to be preserved (it's mostly frame tables and a few
> PTE pages).
> 
> Paul's work is making it possible to use those pages as xenheap pages,
> safe in the knowledge that they *won't* end up being mapped to domains,
> and won't need to be preserved across live update.

I've started looking at the latest version of Paul's series, but I'm
still struggling to see the picture: There's no true distinction
between Xen heap and domain heap on x86-64 (except on very large
systems). Therefore it is unclear to me what "those pages" is actually
referring to above. Surely new Xen can't be given any pages in use
_in any way_ by old Xen, no matter whether it's ones assigned to
domains, or ones used internally to (old) Xen.

Jan
David Woodhouse March 6, 2020, 11:52 a.m. UTC | #14
On Fri, 2020-03-06 at 12:37 +0100, Jan Beulich wrote:
> I've started looking at the latest version of Paul's series, but I'm
> still struggling to see the picture: There's no true distinction
> between Xen heap and domain heap on x86-64 (except on very large
> systems). Therefore it is unclear to me what "those pages" is actually
> referring to above. Surely new Xen can't be given any pages in use
> _in any way_ by old Xen, no matter whether it's ones assigned to
> domains, or ones used internally to (old) Xen.

Old and new Xen do not coexist. There is a kexec (via kexec_reloc.S and
purgatory) from old to new.

There are some pages which new Xen MUST NOT scribble on, because they
actually belong to the domains being preserved. That includes the EPT
(or at least IOMMU) page tables.

I suppose new Xen also mustn't scribble on the pages in which old Xen
has placed the migration information for those domains either. At
least, not until it's consumed the data.

Anything else, however, is fine for new Xen to scribble on. Fairly much
anything that the old Xen had allocated from its xenheap (and not
subsequently shared to a guest, qv) is no longer required and can be
treated as free memory by the new Xen, which now owns the machine.
Durrant, Paul March 6, 2020, 11:59 a.m. UTC | #15
> -----Original Message-----
> From: David Woodhouse <dwmw2@infradead.org>
> Sent: 06 March 2020 11:53
> To: Jan Beulich <jbeulich@suse.com>; Durrant, Paul <pdurrant@amazon.co.uk>
> Cc: julien@xen.org; andrew.cooper3@citrix.com; sstabellini@kernel.org; konrad.wilk@oracle.com;
> Volodymyr_Babchuk@epam.com; ian.jackson@eu.citrix.com; wl@xen.org; george.dunlap@citrix.com; xen-
> devel@lists.xenproject.org
> Subject: RE: [EXTERNAL][PATCH 2/2] domain: use PGC_extra domheap page for shared_info
> 
> On Fri, 2020-03-06 at 12:37 +0100, Jan Beulich wrote:
> > I've started looking at the latest version of Paul's series, but I'm
> > still struggling to see the picture: There's no true distinction
> > between Xen heap and domain heap on x86-64 (except on very large
> > systems). Therefore it is unclear to me what "those pages" is actually
> > referring to above. Surely new Xen can't be given any pages in use
> > _in any way_ by old Xen, no matter whether it's ones assigned to
> > domains, or ones used internally to (old) Xen.
> 
> Old and new Xen do not coexist. There is a kexec (via kexec_reloc.S and
> purgatory) from old to new.
> 
> There are some pages which new Xen MUST NOT scribble on, because they
> actually belong to the domains being preserved. That includes the EPT
> (or at least IOMMU) page tables.
> 
> I suppose new Xen also mustn't scribble on the pages in which old Xen
> has placed the migration information for those domains either. At
> least, not until it's consumed the data.
> 
> Anything else, however, is fine for new Xen to scribble on. Fairly much
> anything that the old Xen had allocated from its xenheap (and not
> subsequently shared to a guest, qv) is no longer required and can be
> treated as free memory by the new Xen, which now owns the machine.
> 

... so getting rid of shared xenheap pages altogether just makes life easier.

  Paul
David Woodhouse March 6, 2020, 12:20 p.m. UTC | #16
On Fri, 2020-03-06 at 12:37 +0100, Jan Beulich wrote:
> > For live update we need to give a region of memory to the new Xen which
> > it can use for its boot allocator, before it's handled any of the live
> > update records and before it knows which *other* memory is still
> > available for use.
> > 
> > In order to do that, the original Xen has to ensure that it *doesn't*
> > use any of that memory region for domain-owned pages which would need
> > to be preserved.
> > 
> > So far in the patches I've posted upstream I have cheated, and simply
> > *not* added them to the main heap. Anything allocated before
> > end_boot_allocator() is fine because it is "ephemeral" to the first Xen
> > and doesn't need to be preserved (it's mostly frame tables and a few
> > PTE pages).
> > 
> > Paul's work is making it possible to use those pages as xenheap pages,
> > safe in the knowledge that they *won't* end up being mapped to domains,
> > and won't need to be preserved across live update.
> 
> I've started looking at the latest version of Paul's series, but I'm
> still struggling to see the picture: There's no true distinction
> between Xen heap and domain heap on x86-64 (except on very large
> systems). Therefore it is unclear to me what "those pages" is actually
> referring to above. Surely new Xen can't be given any pages in use
> _in any way_ by old Xen, no matter whether it's ones assigned to
> domains, or ones used internally to (old) Xen.

Hm, I'm not sure my previous response actually answered your question;
sorry, I've been away all week so it's still Monday morning in my head
right now. Let me try again...

What I said just now is true. The new Xen can use anything that isn't
actually owned by domains, but old Xen is dead and any of its own
internal allocations, Xen page tables and data structures (i.e. most of
what it allocated on its xenheap) have died with it and those pages are
considered 'free' by the new Xen.

Theoretically, it would be possible for the new Xen to go directly to
that state. The live update data could be processed *early* in the new
Xen before the boot allocator is even running, and new Xen could prime
its boot allocator with the memory ranges that happen to be available
according to the criteria outlined above.

Our initial implementation did that, in fact. It was complex in early
boot, and it didn't scale to more than 512 separate free ranges because
the boot allocator panics if it has more free regions than that.

That's why we settled on the model of reserving a specific region for
the new Xen to use for its boot allocator. Old Xen promises that it
won't put anything into that region that needs to be preserved over
kexec, and then the startup process for the new Xen is much simpler; it
can use that contiguous region for its boot allocations and then
process the live update data in a better environment once things like
vmap() are already available Then *finally* it can add the rest of the
system memory that *isn't* used by running domains, into the buddy
allocator.

So this requires old Xen to promise that it *won't* put anything into
that region of reserved bootmem (aka "those pages"), that needs to be
preserved across kexec. That promise is *mostly* equivalent to "will
only allocate xenheap pages from those pages"... except for the fact
that sometimes we allocate a page from the xenheap and share it to
domains.

Thus, "don't do that then", and THEN we can say that it's OK for
xenheap allocations to come from the reserved bootmem region, but not
domheap allocations.
Jan Beulich March 6, 2020, 12:25 p.m. UTC | #17
On 06.03.2020 12:52, David Woodhouse wrote:
> On Fri, 2020-03-06 at 12:37 +0100, Jan Beulich wrote:
>> I've started looking at the latest version of Paul's series, but I'm
>> still struggling to see the picture: There's no true distinction
>> between Xen heap and domain heap on x86-64 (except on very large
>> systems). Therefore it is unclear to me what "those pages" is actually
>> referring to above. Surely new Xen can't be given any pages in use
>> _in any way_ by old Xen, no matter whether it's ones assigned to
>> domains, or ones used internally to (old) Xen.
> 
> Old and new Xen do not coexist. There is a kexec (via kexec_reloc.S and
> purgatory) from old to new.
> 
> There are some pages which new Xen MUST NOT scribble on, because they
> actually belong to the domains being preserved. That includes the EPT
> (or at least IOMMU) page tables.

And likely interrupt remapping tables, device tables, etc. I don't
have a clear picture on how you want to delineate ones in use in any
such way from ones indeed free to re-use.

Jan

> I suppose new Xen also mustn't scribble on the pages in which old Xen
> has placed the migration information for those domains either. At
> least, not until it's consumed the data.
> 
> Anything else, however, is fine for new Xen to scribble on. Fairly much
> anything that the old Xen had allocated from its xenheap (and not
> subsequently shared to a guest, qv) is no longer required and can be
> treated as free memory by the new Xen, which now owns the machine.
> 
>
Jan Beulich March 6, 2020, 12:29 p.m. UTC | #18
On 06.03.2020 12:59, Durrant, Paul wrote:
>> -----Original Message-----
>> From: David Woodhouse <dwmw2@infradead.org>
>> Sent: 06 March 2020 11:53
>> To: Jan Beulich <jbeulich@suse.com>; Durrant, Paul <pdurrant@amazon.co.uk>
>> Cc: julien@xen.org; andrew.cooper3@citrix.com; sstabellini@kernel.org; konrad.wilk@oracle.com;
>> Volodymyr_Babchuk@epam.com; ian.jackson@eu.citrix.com; wl@xen.org; george.dunlap@citrix.com; xen-
>> devel@lists.xenproject.org
>> Subject: RE: [EXTERNAL][PATCH 2/2] domain: use PGC_extra domheap page for shared_info
>>
>> On Fri, 2020-03-06 at 12:37 +0100, Jan Beulich wrote:
>>> I've started looking at the latest version of Paul's series, but I'm
>>> still struggling to see the picture: There's no true distinction
>>> between Xen heap and domain heap on x86-64 (except on very large
>>> systems). Therefore it is unclear to me what "those pages" is actually
>>> referring to above. Surely new Xen can't be given any pages in use
>>> _in any way_ by old Xen, no matter whether it's ones assigned to
>>> domains, or ones used internally to (old) Xen.
>>
>> Old and new Xen do not coexist. There is a kexec (via kexec_reloc.S and
>> purgatory) from old to new.
>>
>> There are some pages which new Xen MUST NOT scribble on, because they
>> actually belong to the domains being preserved. That includes the EPT
>> (or at least IOMMU) page tables.
>>
>> I suppose new Xen also mustn't scribble on the pages in which old Xen
>> has placed the migration information for those domains either. At
>> least, not until it's consumed the data.
>>
>> Anything else, however, is fine for new Xen to scribble on. Fairly much
>> anything that the old Xen had allocated from its xenheap (and not
>> subsequently shared to a guest, qv) is no longer required and can be
>> treated as free memory by the new Xen, which now owns the machine.
>>
> 
> ... so getting rid of shared xenheap pages altogether just makes life easier.

How do you tell pages in use by domains from ones free to re-use?
Because of the overloading of struct page_info, I expect you can't
judge by just looking at a page's struct page_info instance. Are
you peeking into the migration streams for the domains to collect
all the pages? And are you walking IOMMU structures to collect the
ones used for but not accessible by the domains?

Jan
Jan Beulich March 6, 2020, 12:36 p.m. UTC | #19
On 06.03.2020 13:20, David Woodhouse wrote:
> On Fri, 2020-03-06 at 12:37 +0100, Jan Beulich wrote:
>>> For live update we need to give a region of memory to the new Xen which
>>> it can use for its boot allocator, before it's handled any of the live
>>> update records and before it knows which *other* memory is still
>>> available for use.
>>>
>>> In order to do that, the original Xen has to ensure that it *doesn't*
>>> use any of that memory region for domain-owned pages which would need
>>> to be preserved.
>>>
>>> So far in the patches I've posted upstream I have cheated, and simply
>>> *not* added them to the main heap. Anything allocated before
>>> end_boot_allocator() is fine because it is "ephemeral" to the first Xen
>>> and doesn't need to be preserved (it's mostly frame tables and a few
>>> PTE pages).
>>>
>>> Paul's work is making it possible to use those pages as xenheap pages,
>>> safe in the knowledge that they *won't* end up being mapped to domains,
>>> and won't need to be preserved across live update.
>>
>> I've started looking at the latest version of Paul's series, but I'm
>> still struggling to see the picture: There's no true distinction
>> between Xen heap and domain heap on x86-64 (except on very large
>> systems). Therefore it is unclear to me what "those pages" is actually
>> referring to above. Surely new Xen can't be given any pages in use
>> _in any way_ by old Xen, no matter whether it's ones assigned to
>> domains, or ones used internally to (old) Xen.
> 
> Hm, I'm not sure my previous response actually answered your question;
> sorry, I've been away all week so it's still Monday morning in my head
> right now. Let me try again...
> 
> What I said just now is true. The new Xen can use anything that isn't
> actually owned by domains, but old Xen is dead and any of its own
> internal allocations, Xen page tables and data structures (i.e. most of
> what it allocated on its xenheap) have died with it and those pages are
> considered 'free' by the new Xen.
> 
> Theoretically, it would be possible for the new Xen to go directly to
> that state. The live update data could be processed *early* in the new
> Xen before the boot allocator is even running, and new Xen could prime
> its boot allocator with the memory ranges that happen to be available
> according to the criteria outlined above.
> 
> Our initial implementation did that, in fact. It was complex in early
> boot, and it didn't scale to more than 512 separate free ranges because
> the boot allocator panics if it has more free regions than that.
> 
> That's why we settled on the model of reserving a specific region for
> the new Xen to use for its boot allocator. Old Xen promises that it
> won't put anything into that region that needs to be preserved over
> kexec, and then the startup process for the new Xen is much simpler; it
> can use that contiguous region for its boot allocations and then
> process the live update data in a better environment once things like
> vmap() are already available Then *finally* it can add the rest of the
> system memory that *isn't* used by running domains, into the buddy
> allocator.
> 
> So this requires old Xen to promise that it *won't* put anything into
> that region of reserved bootmem (aka "those pages"), that needs to be
> preserved across kexec. That promise is *mostly* equivalent to "will
> only allocate xenheap pages from those pages"... except for the fact
> that sometimes we allocate a page from the xenheap and share it to
> domains.
> 
> Thus, "don't do that then", and THEN we can say that it's OK for
> xenheap allocations to come from the reserved bootmem region, but not
> domheap allocations.

Oh, so really this is an optimization to allow the memory range to
not remain unused altogether by "old" Xen, i.e. unlike the kexec
range. And of course this means you're intending to (at least
partially) resurrect the distinction between domheap and xenheap,
which isn't said anywhere in Paul's series, I don't think. If this
is a sufficiently correct understanding of mine, then on one hand
I start seeing the point of the conversion Paul wants to make, but
otoh this then feels a little like making the 2nd step before the
1st.

Jan
David Woodhouse March 6, 2020, 12:37 p.m. UTC | #20
On Fri, 2020-03-06 at 13:25 +0100, Jan Beulich wrote:
> And likely interrupt remapping tables, device tables, etc. I don't
> have a clear picture on how you want to delineate ones in use in any
> such way from ones indeed free to re-use.

Right. The solution there is two-fold:

For pages in the general population (outside the reserved bootmem), the
responsibility lies with the new Xen. As it processes the live update
information that it receives from the old Xen, it must mark those pages
as in-use so that it doesn't attempt to allocate them.

That's what this bugfix paves the way for — it avoids putting *bad*
pages into the buddy allocator, by setting the page state before the
page is seen by init_heap_pages(), and making init_heap_pages() skip
the pages marked as broken.

It's trivial, then, to make init_heap_pages() *also* skip pages which
get marked as "already in use" when we process the live update data.


The second part, as discussed, is that the old Xen must not put any of
those "needs to be preserved" pages into the reserved bootmem region.

That's what Paul is working on. We stop sharing xenheap pages to
domains, which is part of it — but *also* we need to use the right
allocation for any IOMMU page tables and IRQ remapping tables which
need to be preserved, etc. 

That partly comes out of Hongyan's secret hiding work anyway, since we
no longer get to assume that xenheap pages are already mapped *anyway*,
so we might as *well* be allocating those from domheap.
David Woodhouse March 6, 2020, 12:49 p.m. UTC | #21
On Fri, 2020-03-06 at 13:29 +0100, Jan Beulich wrote:
> How do you tell pages in use by domains from ones free to re-use?
> Because of the overloading of struct page_info, I expect you can't
> judge by just looking at a page's struct page_info instance. Are
> you peeking into the migration streams for the domains to collect
> all the pages? And are you walking IOMMU structures to collect the
> ones used for but not accessible by the domains?

I just outlined the two-part nature of the issue. First the old Xen
must ensure *not* to put any pages that need to be preserved, in the
reserved region.

You're talking about the second part, where the new Xen has to work out
what pages in the *rest* of memory are available to it and which it
needs to preserve.

Which means my first answer has to be "hell no, you can't even *talk*
about the page_info here". Because what we pass from Xen#1 to Xen#2 has
to be an *ABI*, with clearly defined forward-compatible structures.
Never passing over Xen-internal structs like the page_info.

So yes, the new Xen has to infer it from the migration structures for
the domains, and mark the appropriate pages as 'in use' before
init_heap_pages() gets to look at them.

But bear in mind that we can *define* the structures we use for this
too, based on top of the existing live migration data structures.

We don't want to have to actually walk the hardware page tables in the
new Xen. We'll probably end up passing over a list of pages, from old
Xen to new in a newly-defined record type. And old Xen would just keep
that list as it allocates pages for those page tables. Much as it keeps
the page list for domains.
Jan Beulich March 6, 2020, 12:55 p.m. UTC | #22
On 06.03.2020 13:37, David Woodhouse wrote:
> On Fri, 2020-03-06 at 13:25 +0100, Jan Beulich wrote:
>> And likely interrupt remapping tables, device tables, etc. I don't
>> have a clear picture on how you want to delineate ones in use in any
>> such way from ones indeed free to re-use.
> 
> Right. The solution there is two-fold:
> 
> For pages in the general population (outside the reserved bootmem), the
> responsibility lies with the new Xen. As it processes the live update
> information that it receives from the old Xen, it must mark those pages
> as in-use so that it doesn't attempt to allocate them.
> 
> That's what this bugfix paves the way for — it avoids putting *bad*
> pages into the buddy allocator, by setting the page state before the
> page is seen by init_heap_pages(), and making init_heap_pages() skip
> the pages marked as broken.
> 
> It's trivial, then, to make init_heap_pages() *also* skip pages which
> get marked as "already in use" when we process the live update data.
> 
> 
> The second part, as discussed, is that the old Xen must not put any of
> those "needs to be preserved" pages into the reserved bootmem region.
> 
> That's what Paul is working on. We stop sharing xenheap pages to
> domains, which is part of it — but *also* we need to use the right
> allocation for any IOMMU page tables and IRQ remapping tables which
> need to be preserved, etc. 

I'm sorry, but this doesn't really make things much more clear.
Further up you say "As it processes the live update information
...", i.e. that's a case where you get positive indication that a
page is in use. You also have that reserved region, where old Xen
promises to not put anything that needs to survive. (It remains
unclear what exact form and shape this is meant to take, as I
hope you don't mean to re-introduce a Xen heap with static
boundaries, entirely distinct from the domain heap.) But the
situation for all other pages remains rather nebulous to me. Yet
by a certain point in time new Xen will want to take control of
all memory, i.e. know of the used (or not) status of all pages
in the system.

Jan
David Woodhouse March 6, 2020, 12:57 p.m. UTC | #23
On Fri, 2020-03-06 at 13:36 +0100, Jan Beulich wrote:
> Oh, so really this is an optimization to allow the memory range to
> not remain unused altogether by "old" Xen, i.e. unlike the kexec
> range. 

Right. At the moment I just *don't* use the pages in the reserved
region (and that includes inittext/initdata freed by Xen, since the
plan is for new Xen to be placed where old Xen used to be in physical
memory).

From
https://xenbits.xen.org/gitweb/?p=people/dwmw2/xen.git;a=commitdiff;h=cdbef644824

void init_lu_reserved_pages(paddr_t ps, paddr_t pe)
{
    if (!lu_bootmem_start)
        init_xenheap_pages(ps, pe);

    /* There is ongoing work for other reasons to eliminate the use of
     * share_xen_page_with_guest() and get to a point where the normal
     * xenheap actually meets the requirement we need for live update
     * reserved memory, that nothing allocated from it will be mapped
     * to a guest and/or need to be preserved over a live update.
     * Until then, we simply don't use these pages after boot. */
}


> And of course this means you're intending to (at least
> partially) resurrect the distinction between domheap and xenheap,
> which isn't said anywhere in Paul's series, I don't think.

Right. Secret hiding makes the distinction (xenheap is mapped, domheap
is not) mostly go away. We are talking about restoring *a* distinction
between one type of page (Xen ephemeral pages which don't need to be
preserved over live update) and another (must be preserved), but
whether that should still be called "xenheap" vs. "domheap", despite
the massive parallels, isn't entirely clear.

>  If this
> is a sufficiently correct understanding of mine, then on one hand
> I start seeing the point of the conversion Paul wants to make, but
> otoh this then feels a little like making the 2nd step before the
> 1st.


What would you suggest is the first step?
David Woodhouse March 6, 2020, 1:08 p.m. UTC | #24
On Fri, 2020-03-06 at 13:55 +0100, Jan Beulich wrote:
> I'm sorry, but this doesn't really make things much more clear.
> Further up you say "As it processes the live update information
> ...", i.e. that's a case where you get positive indication that a
> page is in use. You also have that reserved region, where old Xen
> promises to not put anything that needs to survive. (It remains
> unclear what exact form and shape this is meant to take, as I
> hope you don't mean to re-introduce a Xen heap with static
> boundaries, entirely distinct from the domain heap.) But the
> situation for all other pages remains rather nebulous to me. Yet
> by a certain point in time new Xen will want to take control of
> all memory, i.e. know of the used (or not) status of all pages
> in the system.

In terms of the design discussion for live update, I don't know that
"xenheap" and "domheap" are the right terms to use as they have a
loaded existing meaning (albeit one which is being blurred a lot as the
secret hiding happens).

Let's say "ephemeral pages" and "preserved pages".

If Xen needs to allocate a page which needs to be preserved over live
update, then it can only come from *outside* the reserved bootmem
region.

If Xen needs to allocate an ephemeral page (which can be thrown away on
live update), that can come from *anywhere*. But we might *start* by
looking in the reserved region, and then allocate from the general
population of memory after the reserved region has been used up.

Obviously, the *usage* of this maps fairly closely to the existing
xenheap and domheap. But we wouldn't end up with a hard partition
between 'xenheap' and 'domheap' as we have had on some architectures
before. We'd satisfy xenheap allocations *first* from the xenheap-only
reserved ranges, and then from the rest of memory (domheap).
Jan Beulich March 6, 2020, 1:10 p.m. UTC | #25
On 06.03.2020 13:57, David Woodhouse wrote:
> On Fri, 2020-03-06 at 13:36 +0100, Jan Beulich wrote:
>> And of course this means you're intending to (at least
>> partially) resurrect the distinction between domheap and xenheap,
>> which isn't said anywhere in Paul's series, I don't think.
> 
> Right. Secret hiding makes the distinction (xenheap is mapped, domheap
> is not) mostly go away. We are talking about restoring *a* distinction
> between one type of page (Xen ephemeral pages which don't need to be
> preserved over live update) and another (must be preserved), but
> whether that should still be called "xenheap" vs. "domheap", despite
> the massive parallels, isn't entirely clear.
> 
>>  If this
>> is a sufficiently correct understanding of mine, then on one hand
>> I start seeing the point of the conversion Paul wants to make, but
>> otoh this then feels a little like making the 2nd step before the
>> 1st.
> 
> 
> What would you suggest is the first step?

Establishing of what the new separation rule and mechanism is going
to be (no matter how the two resulting pieces are going to be
named).

Jan
Paul Durrant March 6, 2020, 1:13 p.m. UTC | #26
> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
> Sent: 06 March 2020 13:10
> To: David Woodhouse <dwmw2@infradead.org>; Durrant, Paul <pdurrant@amazon.co.uk>
> Cc: sstabellini@kernel.org; julien@xen.org; wl@xen.org; konrad.wilk@oracle.com;
> andrew.cooper3@citrix.com; ian.jackson@eu.citrix.com; george.dunlap@citrix.com; xen-
> devel@lists.xenproject.org; Volodymyr_Babchuk@epam.com
> Subject: Re: [Xen-devel] [PATCH 2/2] domain: use PGC_extra domheap page for shared_info
> 
> On 06.03.2020 13:57, David Woodhouse wrote:
> > On Fri, 2020-03-06 at 13:36 +0100, Jan Beulich wrote:
> >> And of course this means you're intending to (at least
> >> partially) resurrect the distinction between domheap and xenheap,
> >> which isn't said anywhere in Paul's series, I don't think.
> >
> > Right. Secret hiding makes the distinction (xenheap is mapped, domheap
> > is not) mostly go away. We are talking about restoring *a* distinction
> > between one type of page (Xen ephemeral pages which don't need to be
> > preserved over live update) and another (must be preserved), but
> > whether that should still be called "xenheap" vs. "domheap", despite
> > the massive parallels, isn't entirely clear.
> >
> >>  If this
> >> is a sufficiently correct understanding of mine, then on one hand
> >> I start seeing the point of the conversion Paul wants to make, but
> >> otoh this then feels a little like making the 2nd step before the
> >> 1st.
> >
> >
> > What would you suggest is the first step?
> 
> Establishing of what the new separation rule and mechanism is going
> to be (no matter how the two resulting pieces are going to be
> named).
> 

Would you be ok with a comment to that effect? My aim is to make the separation abundantly obvious by getting rid of shared xenheap pages (for non-system domains at least) but I can't do that before converting shared_info and grant shared/status frames to domheap.

  Paul

> Jan
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
David Woodhouse March 6, 2020, 1:15 p.m. UTC | #27
On Fri, 2020-03-06 at 14:10 +0100, Jan Beulich wrote:
> Establishing of what the new separation rule and mechanism is going
> to be (no matter how the two resulting pieces are going to be
> named).

Paul's opinion seemed to be that with secret hiding coming along and
destroying the "xenheap is mapped anyway" assumption, the benefit of
allocating a xenheap page and then mapping it to a guest is basically
gone *anyway*, so that part at least made a viable cleanup regardless
of the overall direction.

By the time we start making the IOMMU use domheap for *its* page table
allocations because we want 'domheap ≡ preserved pages', yes I
completely agree that we should have settled the overall direction and
nomenclature first.

But I'm inclined to agree with Paul that this part stands by itself as
a cleanup regardless of that.
Paul Durrant March 6, 2020, 1:22 p.m. UTC | #28
> -----Original Message-----
> From: David Woodhouse <dwmw2@infradead.org>
> Sent: 06 March 2020 13:16
> To: Jan Beulich <jbeulich@suse.com>; Durrant, Paul <pdurrant@amazon.co.uk>
> Cc: julien@xen.org; andrew.cooper3@citrix.com; sstabellini@kernel.org; konrad.wilk@oracle.com;
> Volodymyr_Babchuk@epam.com; ian.jackson@eu.citrix.com; wl@xen.org; george.dunlap@citrix.com; xen-
> devel@lists.xenproject.org
> Subject: RE: [EXTERNAL][PATCH 2/2] domain: use PGC_extra domheap page for shared_info
> 
> On Fri, 2020-03-06 at 14:10 +0100, Jan Beulich wrote:
> > Establishing of what the new separation rule and mechanism is going
> > to be (no matter how the two resulting pieces are going to be
> > named).
> 
> Paul's opinion seemed to be that with secret hiding coming along and
> destroying the "xenheap is mapped anyway" assumption, the benefit of
> allocating a xenheap page and then mapping it to a guest is basically
> gone *anyway*, so that part at least made a viable cleanup regardless
> of the overall direction.

Indeed, that is my opinion. The distinction between a mapped domheap page and a xenheap page basically goes away with secret hiding since the direct map is basically gone so, given that it helps simplify LU *and* gets rid of the domain xenheap list (and hence the somewhat subtle processing of that list in domain_kill()), getting rid of shared xenheap pages seems like a useful thing to do.

  Paul
Jan Beulich March 6, 2020, 1:23 p.m. UTC | #29
On 06.03.2020 14:13, Paul Durrant wrote:
>> -----Original Message-----
>> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
>> Sent: 06 March 2020 13:10
>> To: David Woodhouse <dwmw2@infradead.org>; Durrant, Paul <pdurrant@amazon.co.uk>
>> Cc: sstabellini@kernel.org; julien@xen.org; wl@xen.org; konrad.wilk@oracle.com;
>> andrew.cooper3@citrix.com; ian.jackson@eu.citrix.com; george.dunlap@citrix.com; xen-
>> devel@lists.xenproject.org; Volodymyr_Babchuk@epam.com
>> Subject: Re: [Xen-devel] [PATCH 2/2] domain: use PGC_extra domheap page for shared_info
>>
>> On 06.03.2020 13:57, David Woodhouse wrote:
>>> On Fri, 2020-03-06 at 13:36 +0100, Jan Beulich wrote:
>>>> And of course this means you're intending to (at least
>>>> partially) resurrect the distinction between domheap and xenheap,
>>>> which isn't said anywhere in Paul's series, I don't think.
>>>
>>> Right. Secret hiding makes the distinction (xenheap is mapped, domheap
>>> is not) mostly go away. We are talking about restoring *a* distinction
>>> between one type of page (Xen ephemeral pages which don't need to be
>>> preserved over live update) and another (must be preserved), but
>>> whether that should still be called "xenheap" vs. "domheap", despite
>>> the massive parallels, isn't entirely clear.
>>>
>>>>  If this
>>>> is a sufficiently correct understanding of mine, then on one hand
>>>> I start seeing the point of the conversion Paul wants to make, but
>>>> otoh this then feels a little like making the 2nd step before the
>>>> 1st.
>>>
>>>
>>> What would you suggest is the first step?
>>
>> Establishing of what the new separation rule and mechanism is going
>> to be (no matter how the two resulting pieces are going to be
>> named).
>>
> 
> Would you be ok with a comment to that effect?

Not sure. It would certainly help if the cover letter at least
mentioned other related aspects like this one.

> My aim is to make the separation abundantly obvious by getting rid
> of shared xenheap pages (for non-system domains at least) but I
> can't do that before converting shared_info and grant shared/status
> frames to domheap.

Following David's various replies - instead of going this route of
replacing the sharing of xenheap pages by different logic, the
same ought to be achievable by making the backing allocations come
from the correct pool?

Jan
Paul Durrant March 6, 2020, 1:26 p.m. UTC | #30
> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
> Sent: 06 March 2020 13:24
> To: Paul Durrant <xadimgnik@gmail.com>
> Cc: sstabellini@kernel.org; julien@xen.org; Volodymyr_Babchuk@epam.com; wl@xen.org;
> konrad.wilk@oracle.com; andrew.cooper3@citrix.com; ian.jackson@eu.citrix.com;
> george.dunlap@citrix.com; xen-devel@lists.xenproject.org; 'David Woodhouse' <dwmw2@infradead.org>
> Subject: Re: [Xen-devel] [PATCH 2/2] domain: use PGC_extra domheap page for shared_info
> 
> On 06.03.2020 14:13, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
> >> Sent: 06 March 2020 13:10
> >> To: David Woodhouse <dwmw2@infradead.org>; Durrant, Paul <pdurrant@amazon.co.uk>
> >> Cc: sstabellini@kernel.org; julien@xen.org; wl@xen.org; konrad.wilk@oracle.com;
> >> andrew.cooper3@citrix.com; ian.jackson@eu.citrix.com; george.dunlap@citrix.com; xen-
> >> devel@lists.xenproject.org; Volodymyr_Babchuk@epam.com
> >> Subject: Re: [Xen-devel] [PATCH 2/2] domain: use PGC_extra domheap page for shared_info
> >>
> >> On 06.03.2020 13:57, David Woodhouse wrote:
> >>> On Fri, 2020-03-06 at 13:36 +0100, Jan Beulich wrote:
> >>>> And of course this means you're intending to (at least
> >>>> partially) resurrect the distinction between domheap and xenheap,
> >>>> which isn't said anywhere in Paul's series, I don't think.
> >>>
> >>> Right. Secret hiding makes the distinction (xenheap is mapped, domheap
> >>> is not) mostly go away. We are talking about restoring *a* distinction
> >>> between one type of page (Xen ephemeral pages which don't need to be
> >>> preserved over live update) and another (must be preserved), but
> >>> whether that should still be called "xenheap" vs. "domheap", despite
> >>> the massive parallels, isn't entirely clear.
> >>>
> >>>>  If this
> >>>> is a sufficiently correct understanding of mine, then on one hand
> >>>> I start seeing the point of the conversion Paul wants to make, but
> >>>> otoh this then feels a little like making the 2nd step before the
> >>>> 1st.
> >>>
> >>>
> >>> What would you suggest is the first step?
> >>
> >> Establishing of what the new separation rule and mechanism is going
> >> to be (no matter how the two resulting pieces are going to be
> >> named).
> >>
> >
> > Would you be ok with a comment to that effect?
> 
> Not sure. It would certainly help if the cover letter at least
> mentioned other related aspects like this one.
> 
> > My aim is to make the separation abundantly obvious by getting rid
> > of shared xenheap pages (for non-system domains at least) but I
> > can't do that before converting shared_info and grant shared/status
> > frames to domheap.
> 
> Following David's various replies - instead of going this route of
> replacing the sharing of xenheap pages by different logic, the
> same ought to be achievable by making the backing allocations come
> from the correct pool?
> 

I still prefer the simplification of not having to clean up the shared xenheap page list in domain_kill() so IMO it is still worth it for that alone.

  Paul
Jan Beulich March 6, 2020, 1:36 p.m. UTC | #31
On 06.03.2020 14:26, Paul Durrant wrote:
>> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
>> Sent: 06 March 2020 13:24
>>
>> On 06.03.2020 14:13, Paul Durrant wrote:
>>> My aim is to make the separation abundantly obvious by getting rid
>>> of shared xenheap pages (for non-system domains at least) but I
>>> can't do that before converting shared_info and grant shared/status
>>> frames to domheap.
>>
>> Following David's various replies - instead of going this route of
>> replacing the sharing of xenheap pages by different logic, the
>> same ought to be achievable by making the backing allocations come
>> from the correct pool?
>>
> 
> I still prefer the simplification of not having to clean up the
> shared xenheap page list in domain_kill() so IMO it is still worth
> it for that alone.

I don't see anything very special with the cleaning up in
domain_kill() / domain_relinquish_resources(). What I'd view as
more desirable in this regard is the general fact of needing
two lists. But you realize there's a downside to this as well?
dump_pageframe_info() will reliably show _all_ Xen heap pages
associated with a domain, but it will only ever show up to 10
pages on ->page_list for a domain that's not already being
cleaned up.

Jan
Paul Durrant March 6, 2020, 1:41 p.m. UTC | #32
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 06 March 2020 13:36
> To: Paul Durrant <xadimgnik@gmail.com>
> Cc: sstabellini@kernel.org; julien@xen.org; Volodymyr_Babchuk@epam.com; wl@xen.org;
> konrad.wilk@oracle.com; andrew.cooper3@citrix.com; ian.jackson@eu.citrix.com;
> george.dunlap@citrix.com; xen-devel@lists.xenproject.org; 'David Woodhouse' <dwmw2@infradead.org>
> Subject: Re: [PATCH 2/2] domain: use PGC_extra domheap page for shared_info
> 
> On 06.03.2020 14:26, Paul Durrant wrote:
> >> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
> >> Sent: 06 March 2020 13:24
> >>
> >> On 06.03.2020 14:13, Paul Durrant wrote:
> >>> My aim is to make the separation abundantly obvious by getting rid
> >>> of shared xenheap pages (for non-system domains at least) but I
> >>> can't do that before converting shared_info and grant shared/status
> >>> frames to domheap.
> >>
> >> Following David's various replies - instead of going this route of
> >> replacing the sharing of xenheap pages by different logic, the
> >> same ought to be achievable by making the backing allocations come
> >> from the correct pool?
> >>
> >
> > I still prefer the simplification of not having to clean up the
> > shared xenheap page list in domain_kill() so IMO it is still worth
> > it for that alone.
> 
> I don't see anything very special with the cleaning up in
> domain_kill() / domain_relinquish_resources(). What I'd view as
> more desirable in this regard is the general fact of needing
> two lists. But you realize there's a downside to this as well?
> dump_pageframe_info() will reliably show _all_ Xen heap pages
> associated with a domain, but it will only ever show up to 10
> pages on ->page_list for a domain that's not already being
> cleaned up.

That's not much of a downside though I don't think. The xenheap (or PGC_extra domheap pages) are 'special' and so info about them ought to be available via an alternate dump function anyway (and if not already, it can be added).

  Paul

> 
> Jan
Jan Beulich March 6, 2020, 1:46 p.m. UTC | #33
On 06.03.2020 14:41, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 06 March 2020 13:36
>> To: Paul Durrant <xadimgnik@gmail.com>
>> Cc: sstabellini@kernel.org; julien@xen.org; Volodymyr_Babchuk@epam.com; wl@xen.org;
>> konrad.wilk@oracle.com; andrew.cooper3@citrix.com; ian.jackson@eu.citrix.com;
>> george.dunlap@citrix.com; xen-devel@lists.xenproject.org; 'David Woodhouse' <dwmw2@infradead.org>
>> Subject: Re: [PATCH 2/2] domain: use PGC_extra domheap page for shared_info
>>
>> On 06.03.2020 14:26, Paul Durrant wrote:
>>>> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
>>>> Sent: 06 March 2020 13:24
>>>>
>>>> On 06.03.2020 14:13, Paul Durrant wrote:
>>>>> My aim is to make the separation abundantly obvious by getting rid
>>>>> of shared xenheap pages (for non-system domains at least) but I
>>>>> can't do that before converting shared_info and grant shared/status
>>>>> frames to domheap.
>>>>
>>>> Following David's various replies - instead of going this route of
>>>> replacing the sharing of xenheap pages by different logic, the
>>>> same ought to be achievable by making the backing allocations come
>>>> from the correct pool?
>>>>
>>>
>>> I still prefer the simplification of not having to clean up the
>>> shared xenheap page list in domain_kill() so IMO it is still worth
>>> it for that alone.
>>
>> I don't see anything very special with the cleaning up in
>> domain_kill() / domain_relinquish_resources(). What I'd view as
>> more desirable in this regard is the general fact of needing
>> two lists. But you realize there's a downside to this as well?
>> dump_pageframe_info() will reliably show _all_ Xen heap pages
>> associated with a domain, but it will only ever show up to 10
>> pages on ->page_list for a domain that's not already being
>> cleaned up.
> 
> That's not much of a downside though I don't think. The xenheap
> (or PGC_extra domheap pages) are 'special' and so info about
> them ought to be available via an alternate dump function anyway
> (and if not already, it can be added).

Whatever you'd add, the logic would need to either traverse the
entire ->page_list (can take very long) or have/use out of band
information where such pages may have a record (fragile).

Jan
Paul Durrant March 6, 2020, 4:27 p.m. UTC | #34
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 06 March 2020 13:46
> To: Paul Durrant <xadimgnik@gmail.com>
> Cc: sstabellini@kernel.org; julien@xen.org; Volodymyr_Babchuk@epam.com; wl@xen.org;
> konrad.wilk@oracle.com; andrew.cooper3@citrix.com; ian.jackson@eu.citrix.com;
> george.dunlap@citrix.com; xen-devel@lists.xenproject.org; 'David Woodhouse' <dwmw2@infradead.org>
> Subject: Re: [PATCH 2/2] domain: use PGC_extra domheap page for shared_info
> 
> On 06.03.2020 14:41, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 06 March 2020 13:36
> >> To: Paul Durrant <xadimgnik@gmail.com>
> >> Cc: sstabellini@kernel.org; julien@xen.org; Volodymyr_Babchuk@epam.com; wl@xen.org;
> >> konrad.wilk@oracle.com; andrew.cooper3@citrix.com; ian.jackson@eu.citrix.com;
> >> george.dunlap@citrix.com; xen-devel@lists.xenproject.org; 'David Woodhouse' <dwmw2@infradead.org>
> >> Subject: Re: [PATCH 2/2] domain: use PGC_extra domheap page for shared_info
> >>
> >> On 06.03.2020 14:26, Paul Durrant wrote:
> >>>> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
> >>>> Sent: 06 March 2020 13:24
> >>>>
> >>>> On 06.03.2020 14:13, Paul Durrant wrote:
> >>>>> My aim is to make the separation abundantly obvious by getting rid
> >>>>> of shared xenheap pages (for non-system domains at least) but I
> >>>>> can't do that before converting shared_info and grant shared/status
> >>>>> frames to domheap.
> >>>>
> >>>> Following David's various replies - instead of going this route of
> >>>> replacing the sharing of xenheap pages by different logic, the
> >>>> same ought to be achievable by making the backing allocations come
> >>>> from the correct pool?
> >>>>
> >>>
> >>> I still prefer the simplification of not having to clean up the
> >>> shared xenheap page list in domain_kill() so IMO it is still worth
> >>> it for that alone.
> >>
> >> I don't see anything very special with the cleaning up in
> >> domain_kill() / domain_relinquish_resources(). What I'd view as
> >> more desirable in this regard is the general fact of needing
> >> two lists. But you realize there's a downside to this as well?
> >> dump_pageframe_info() will reliably show _all_ Xen heap pages
> >> associated with a domain, but it will only ever show up to 10
> >> pages on ->page_list for a domain that's not already being
> >> cleaned up.
> >
> > That's not much of a downside though I don't think. The xenheap
> > (or PGC_extra domheap pages) are 'special' and so info about
> > them ought to be available via an alternate dump function anyway
> > (and if not already, it can be added).
> 
> Whatever you'd add, the logic would need to either traverse the
> entire ->page_list (can take very long) or have/use out of band
> information where such pages may have a record (fragile).
> 

But the shared xenheap pages in question are only shared info, or grant table, so their information can be dumped separately.
I guess it makes more sense to add another patch into the series to do explicit dump of shared_info and then exclude 'special' pages from dump_pageframe_info().

  Paul

> Jan
Jan Beulich March 6, 2020, 5:16 p.m. UTC | #35
On 06.03.2020 17:27, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 06 March 2020 13:46
>> To: Paul Durrant <xadimgnik@gmail.com>
>> Cc: sstabellini@kernel.org; julien@xen.org; Volodymyr_Babchuk@epam.com; wl@xen.org;
>> konrad.wilk@oracle.com; andrew.cooper3@citrix.com; ian.jackson@eu.citrix.com;
>> george.dunlap@citrix.com; xen-devel@lists.xenproject.org; 'David Woodhouse' <dwmw2@infradead.org>
>> Subject: Re: [PATCH 2/2] domain: use PGC_extra domheap page for shared_info
>>
>> On 06.03.2020 14:41, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: 06 March 2020 13:36
>>>> To: Paul Durrant <xadimgnik@gmail.com>
>>>> Cc: sstabellini@kernel.org; julien@xen.org; Volodymyr_Babchuk@epam.com; wl@xen.org;
>>>> konrad.wilk@oracle.com; andrew.cooper3@citrix.com; ian.jackson@eu.citrix.com;
>>>> george.dunlap@citrix.com; xen-devel@lists.xenproject.org; 'David Woodhouse' <dwmw2@infradead.org>
>>>> Subject: Re: [PATCH 2/2] domain: use PGC_extra domheap page for shared_info
>>>>
>>>> On 06.03.2020 14:26, Paul Durrant wrote:
>>>>>> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
>>>>>> Sent: 06 March 2020 13:24
>>>>>>
>>>>>> On 06.03.2020 14:13, Paul Durrant wrote:
>>>>>>> My aim is to make the separation abundantly obvious by getting rid
>>>>>>> of shared xenheap pages (for non-system domains at least) but I
>>>>>>> can't do that before converting shared_info and grant shared/status
>>>>>>> frames to domheap.
>>>>>>
>>>>>> Following David's various replies - instead of going this route of
>>>>>> replacing the sharing of xenheap pages by different logic, the
>>>>>> same ought to be achievable by making the backing allocations come
>>>>>> from the correct pool?
>>>>>>
>>>>>
>>>>> I still prefer the simplification of not having to clean up the
>>>>> shared xenheap page list in domain_kill() so IMO it is still worth
>>>>> it for that alone.
>>>>
>>>> I don't see anything very special with the cleaning up in
>>>> domain_kill() / domain_relinquish_resources(). What I'd view as
>>>> more desirable in this regard is the general fact of needing
>>>> two lists. But you realize there's a downside to this as well?
>>>> dump_pageframe_info() will reliably show _all_ Xen heap pages
>>>> associated with a domain, but it will only ever show up to 10
>>>> pages on ->page_list for a domain that's not already being
>>>> cleaned up.
>>>
>>> That's not much of a downside though I don't think. The xenheap
>>> (or PGC_extra domheap pages) are 'special' and so info about
>>> them ought to be available via an alternate dump function anyway
>>> (and if not already, it can be added).
>>
>> Whatever you'd add, the logic would need to either traverse the
>> entire ->page_list (can take very long) or have/use out of band
>> information where such pages may have a record (fragile).
>>
> 
> But the shared xenheap pages in question are only shared info, or
> grant table, so their information can be dumped separately.
> I guess it makes more sense to add another patch into the series
> to do explicit dump of shared_info and then exclude 'special'
> pages from dump_pageframe_info().

Bu that's why I said "fragile" - new uses of such pages wouldn't
automatically be picked up, whereas them all landing on xenpage_list
made their dumping a reliable thing.

Jan
Paul Durrant March 9, 2020, 8:48 a.m. UTC | #36
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 06 March 2020 17:17
> To: Paul Durrant <xadimgnik@gmail.com>
> Cc: sstabellini@kernel.org; julien@xen.org; Volodymyr_Babchuk@epam.com; wl@xen.org;
> konrad.wilk@oracle.com; andrew.cooper3@citrix.com; ian.jackson@eu.citrix.com;
> george.dunlap@citrix.com; xen-devel@lists.xenproject.org; 'David Woodhouse' <dwmw2@infradead.org>
> Subject: Re: [PATCH 2/2] domain: use PGC_extra domheap page for shared_info
> 
> On 06.03.2020 17:27, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 06 March 2020 13:46
> >> To: Paul Durrant <xadimgnik@gmail.com>
> >> Cc: sstabellini@kernel.org; julien@xen.org; Volodymyr_Babchuk@epam.com; wl@xen.org;
> >> konrad.wilk@oracle.com; andrew.cooper3@citrix.com; ian.jackson@eu.citrix.com;
> >> george.dunlap@citrix.com; xen-devel@lists.xenproject.org; 'David Woodhouse' <dwmw2@infradead.org>
> >> Subject: Re: [PATCH 2/2] domain: use PGC_extra domheap page for shared_info
> >>
> >> On 06.03.2020 14:41, Paul Durrant wrote:
> >>>> -----Original Message-----
> >>>> From: Jan Beulich <jbeulich@suse.com>
> >>>> Sent: 06 March 2020 13:36
> >>>> To: Paul Durrant <xadimgnik@gmail.com>
> >>>> Cc: sstabellini@kernel.org; julien@xen.org; Volodymyr_Babchuk@epam.com; wl@xen.org;
> >>>> konrad.wilk@oracle.com; andrew.cooper3@citrix.com; ian.jackson@eu.citrix.com;
> >>>> george.dunlap@citrix.com; xen-devel@lists.xenproject.org; 'David Woodhouse' <dwmw2@infradead.org>
> >>>> Subject: Re: [PATCH 2/2] domain: use PGC_extra domheap page for shared_info
> >>>>
> >>>> On 06.03.2020 14:26, Paul Durrant wrote:
> >>>>>> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
> >>>>>> Sent: 06 March 2020 13:24
> >>>>>>
> >>>>>> On 06.03.2020 14:13, Paul Durrant wrote:
> >>>>>>> My aim is to make the separation abundantly obvious by getting rid
> >>>>>>> of shared xenheap pages (for non-system domains at least) but I
> >>>>>>> can't do that before converting shared_info and grant shared/status
> >>>>>>> frames to domheap.
> >>>>>>
> >>>>>> Following David's various replies - instead of going this route of
> >>>>>> replacing the sharing of xenheap pages by different logic, the
> >>>>>> same ought to be achievable by making the backing allocations come
> >>>>>> from the correct pool?
> >>>>>>
> >>>>>
> >>>>> I still prefer the simplification of not having to clean up the
> >>>>> shared xenheap page list in domain_kill() so IMO it is still worth
> >>>>> it for that alone.
> >>>>
> >>>> I don't see anything very special with the cleaning up in
> >>>> domain_kill() / domain_relinquish_resources(). What I'd view as
> >>>> more desirable in this regard is the general fact of needing
> >>>> two lists. But you realize there's a downside to this as well?
> >>>> dump_pageframe_info() will reliably show _all_ Xen heap pages
> >>>> associated with a domain, but it will only ever show up to 10
> >>>> pages on ->page_list for a domain that's not already being
> >>>> cleaned up.
> >>>
> >>> That's not much of a downside though I don't think. The xenheap
> >>> (or PGC_extra domheap pages) are 'special' and so info about
> >>> them ought to be available via an alternate dump function anyway
> >>> (and if not already, it can be added).
> >>
> >> Whatever you'd add, the logic would need to either traverse the
> >> entire ->page_list (can take very long) or have/use out of band
> >> information where such pages may have a record (fragile).
> >>
> >
> > But the shared xenheap pages in question are only shared info, or
> > grant table, so their information can be dumped separately.
> > I guess it makes more sense to add another patch into the series
> > to do explicit dump of shared_info and then exclude 'special'
> > pages from dump_pageframe_info().
> 
> Bu that's why I said "fragile" - new uses of such pages wouldn't
> automatically be picked up, whereas them all landing on xenpage_list
> made their dumping a reliable thing.
> 

But how useful is dumping xenheap pages this way? There's nothing that actually says what they are for so I can't see why it is particularly useful. Having something that says 'This is the shared_info page' and 'These are the grant shared frames' seems much more desirable... and any new ones added would almost certainly merit similar dump functions.

  Paul

> Jan
Jan Beulich March 9, 2020, 8:54 a.m. UTC | #37
On 09.03.2020 09:48, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 06 March 2020 17:17
>> To: Paul Durrant <xadimgnik@gmail.com>
>> Cc: sstabellini@kernel.org; julien@xen.org; Volodymyr_Babchuk@epam.com; wl@xen.org;
>> konrad.wilk@oracle.com; andrew.cooper3@citrix.com; ian.jackson@eu.citrix.com;
>> george.dunlap@citrix.com; xen-devel@lists.xenproject.org; 'David Woodhouse' <dwmw2@infradead.org>
>> Subject: Re: [PATCH 2/2] domain: use PGC_extra domheap page for shared_info
>>
>> On 06.03.2020 17:27, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: 06 March 2020 13:46
>>>> To: Paul Durrant <xadimgnik@gmail.com>
>>>> Cc: sstabellini@kernel.org; julien@xen.org; Volodymyr_Babchuk@epam.com; wl@xen.org;
>>>> konrad.wilk@oracle.com; andrew.cooper3@citrix.com; ian.jackson@eu.citrix.com;
>>>> george.dunlap@citrix.com; xen-devel@lists.xenproject.org; 'David Woodhouse' <dwmw2@infradead.org>
>>>> Subject: Re: [PATCH 2/2] domain: use PGC_extra domheap page for shared_info
>>>>
>>>> On 06.03.2020 14:41, Paul Durrant wrote:
>>>>>> -----Original Message-----
>>>>>> From: Jan Beulich <jbeulich@suse.com>
>>>>>> Sent: 06 March 2020 13:36
>>>>>> To: Paul Durrant <xadimgnik@gmail.com>
>>>>>> Cc: sstabellini@kernel.org; julien@xen.org; Volodymyr_Babchuk@epam.com; wl@xen.org;
>>>>>> konrad.wilk@oracle.com; andrew.cooper3@citrix.com; ian.jackson@eu.citrix.com;
>>>>>> george.dunlap@citrix.com; xen-devel@lists.xenproject.org; 'David Woodhouse' <dwmw2@infradead.org>
>>>>>> Subject: Re: [PATCH 2/2] domain: use PGC_extra domheap page for shared_info
>>>>>>
>>>>>> On 06.03.2020 14:26, Paul Durrant wrote:
>>>>>>>> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
>>>>>>>> Sent: 06 March 2020 13:24
>>>>>>>>
>>>>>>>> On 06.03.2020 14:13, Paul Durrant wrote:
>>>>>>>>> My aim is to make the separation abundantly obvious by getting rid
>>>>>>>>> of shared xenheap pages (for non-system domains at least) but I
>>>>>>>>> can't do that before converting shared_info and grant shared/status
>>>>>>>>> frames to domheap.
>>>>>>>>
>>>>>>>> Following David's various replies - instead of going this route of
>>>>>>>> replacing the sharing of xenheap pages by different logic, the
>>>>>>>> same ought to be achievable by making the backing allocations come
>>>>>>>> from the correct pool?
>>>>>>>>
>>>>>>>
>>>>>>> I still prefer the simplification of not having to clean up the
>>>>>>> shared xenheap page list in domain_kill() so IMO it is still worth
>>>>>>> it for that alone.
>>>>>>
>>>>>> I don't see anything very special with the cleaning up in
>>>>>> domain_kill() / domain_relinquish_resources(). What I'd view as
>>>>>> more desirable in this regard is the general fact of needing
>>>>>> two lists. But you realize there's a downside to this as well?
>>>>>> dump_pageframe_info() will reliably show _all_ Xen heap pages
>>>>>> associated with a domain, but it will only ever show up to 10
>>>>>> pages on ->page_list for a domain that's not already being
>>>>>> cleaned up.
>>>>>
>>>>> That's not much of a downside though I don't think. The xenheap
>>>>> (or PGC_extra domheap pages) are 'special' and so info about
>>>>> them ought to be available via an alternate dump function anyway
>>>>> (and if not already, it can be added).
>>>>
>>>> Whatever you'd add, the logic would need to either traverse the
>>>> entire ->page_list (can take very long) or have/use out of band
>>>> information where such pages may have a record (fragile).
>>>>
>>>
>>> But the shared xenheap pages in question are only shared info, or
>>> grant table, so their information can be dumped separately.
>>> I guess it makes more sense to add another patch into the series
>>> to do explicit dump of shared_info and then exclude 'special'
>>> pages from dump_pageframe_info().
>>
>> Bu that's why I said "fragile" - new uses of such pages wouldn't
>> automatically be picked up, whereas them all landing on xenpage_list
>> made their dumping a reliable thing.
>>
> 
> But how useful is dumping xenheap pages this way? There's nothing
> that actually says what they are for so I can't see why it is
> particularly useful.

That's no different from the domheap page list dumping. The main
point of it - aiui - is to have a hook on finding where possible
leaks sit. For xenheap pages, actually, one can (currently) infer
what they are used for from their position on the list, I think.

> Having something that says 'This is the shared_info page' and
> 'These are the grant shared frames' seems much more desirable...
> and any new ones added would almost certainly merit similar dump
> functions.

Possibly, yet that's different (partly extended, partly
orthogonal) functionality. Doing such may indeed be useful.

Jan
diff mbox series

Patch

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 2cbcdaac08..3904519256 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -1006,6 +1006,8 @@  int domain_relinquish_resources(struct domain *d)
         BUG();
     }
 
+    free_shared_info(d);
+
     return 0;
 }
 
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index eb7b0fc51c..3ad532eccf 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -691,7 +691,6 @@  void arch_domain_destroy(struct domain *d)
         pv_domain_destroy(d);
     free_perdomain_mappings(d);
 
-    free_shared_info(d);
     cleanup_domain_irq_mapping(d);
 
     psr_domain_free(d);
@@ -2246,6 +2245,8 @@  int domain_relinquish_resources(struct domain *d)
     if ( is_hvm_domain(d) )
         hvm_domain_relinquish_resources(d);
 
+    free_shared_info(d);
+
     return 0;
 }
 
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index c5f428d67c..787d97d85e 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -695,8 +695,7 @@  int p2m_alloc_table(struct p2m_domain *p2m)
 
     p2m_lock(p2m);
 
-    if ( p2m_is_hostp2m(p2m)
-         && !page_list_empty(&d->page_list) )
+    if ( p2m_is_hostp2m(p2m) && domain_tot_pages(d) )
     {
         P2M_ERROR("dom %d already has memory allocated\n", d->domain_id);
         p2m_unlock(p2m);
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index dc16ef2e79..f8f1bbe2f4 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -792,6 +792,10 @@  int __init dom0_construct_pv(struct domain *d,
     {
         mfn = mfn_x(page_to_mfn(page));
         BUG_ON(SHARED_M2P(get_gpfn_from_mfn(mfn)));
+
+        if ( page->count_info & PGC_extra )
+            continue;
+
         if ( get_gpfn_from_mfn(mfn) >= count )
         {
             BUG_ON(is_pv_32bit_domain(d));
diff --git a/xen/common/domain.c b/xen/common/domain.c
index ba7a905258..1d42fbcc0f 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -128,9 +128,9 @@  static void vcpu_info_reset(struct vcpu *v)
 {
     struct domain *d = v->domain;
 
-    v->vcpu_info = ((v->vcpu_id < XEN_LEGACY_MAX_VCPUS)
+    v->vcpu_info = (!d->is_dying && v->vcpu_id < XEN_LEGACY_MAX_VCPUS)
                     ? (vcpu_info_t *)&shared_info(d, vcpu_info[v->vcpu_id])
-                    : &dummy_vcpu_info);
+                    : &dummy_vcpu_info;
     v->vcpu_info_mfn = INVALID_MFN;
 }
 
@@ -1650,24 +1650,37 @@  int continue_hypercall_on_cpu(
 
 int alloc_shared_info(struct domain *d, unsigned int memflags)
 {
-    if ( (d->shared_info.virt = alloc_xenheap_pages(0, memflags)) == NULL )
+    struct page_info *pg;
+
+    pg = alloc_domheap_page(d, MEMF_no_refcount | memflags);
+    if ( !pg )
         return -ENOMEM;
 
-    d->shared_info.mfn = virt_to_mfn(d->shared_info.virt);
+    if ( !get_page_and_type(pg, d, PGT_writable_page) )
+        return -ENODATA;
+
+    d->shared_info.mfn = page_to_mfn(pg);
+    d->shared_info.virt = __map_domain_page_global(pg);
 
     clear_page(d->shared_info.virt);
-    share_xen_page_with_guest(mfn_to_page(d->shared_info.mfn), d, SHARE_rw);
 
     return 0;
 }
 
 void free_shared_info(struct domain *d)
 {
+    struct page_info *pg;
+
     if ( !d->shared_info.virt )
         return;
 
-    free_xenheap_page(d->shared_info.virt);
+    unmap_domain_page_global(d->shared_info.virt);
     d->shared_info.virt = NULL;
+
+    pg = mfn_to_page(d->shared_info.mfn);
+
+    put_page_alloc_ref(pg);
+    put_page_and_type(pg);
 }
 
 /*
diff --git a/xen/common/event_2l.c b/xen/common/event_2l.c
index e1dbb860f4..a72fe0232b 100644
--- a/xen/common/event_2l.c
+++ b/xen/common/event_2l.c
@@ -27,6 +27,7 @@  static void evtchn_2l_set_pending(struct vcpu *v, struct evtchn *evtchn)
      * others may require explicit memory barriers.
      */
 
+    ASSERT(d->shared_info.virt);
     if ( guest_test_and_set_bit(d, port, &shared_info(d, evtchn_pending)) )
         return;
 
@@ -54,6 +55,7 @@  static void evtchn_2l_unmask(struct domain *d, struct evtchn *evtchn)
      * These operations must happen in strict order. Based on
      * evtchn_2l_set_pending() above.
      */
+    ASSERT(d->shared_info.virt);
     if ( guest_test_and_clear_bit(d, port, &shared_info(d, evtchn_mask)) &&
          guest_test_bit(d, port, &shared_info(d, evtchn_pending)) &&
          !guest_test_and_set_bit(d, port / BITS_PER_EVTCHN_WORD(d),
@@ -67,6 +69,7 @@  static bool evtchn_2l_is_pending(const struct domain *d, evtchn_port_t port)
 {
     unsigned int max_ports = BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d);
 
+    ASSERT(d->shared_info.virt);
     ASSERT(port < max_ports);
     return (port < max_ports &&
             guest_test_bit(d, port, &shared_info(d, evtchn_pending)));
@@ -76,6 +79,7 @@  static bool evtchn_2l_is_masked(const struct domain *d, evtchn_port_t port)
 {
     unsigned int max_ports = BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d);
 
+    ASSERT(d->shared_info.virt);
     ASSERT(port < max_ports);
     return (port >= max_ports ||
             guest_test_bit(d, port, &shared_info(d, evtchn_mask)));
diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c
index 230f440f14..e8c6045d72 100644
--- a/xen/common/event_fifo.c
+++ b/xen/common/event_fifo.c
@@ -497,6 +497,7 @@  static void setup_ports(struct domain *d)
 
         evtchn = evtchn_from_port(d, port);
 
+        ASSERT(d->shared_info.virt);
         if ( guest_test_bit(d, port, &shared_info(d, evtchn_pending)) )
             evtchn->pending = 1;
 
diff --git a/xen/common/time.c b/xen/common/time.c
index 58fa9abc40..938226c7b1 100644
--- a/xen/common/time.c
+++ b/xen/common/time.c
@@ -99,6 +99,9 @@  void update_domain_wallclock_time(struct domain *d)
     uint32_t *wc_version;
     uint64_t sec;
 
+    if ( d->is_dying )
+        return;
+
     spin_lock(&wc_lock);
 
     wc_version = &shared_info(d, wc_version);