[3/3] x86 / vmx: use a 'normal' domheap page for APIC_DEFAULT_PHYS_BASE
diff mbox series

Message ID 20200121120009.1767-4-pdurrant@amazon.com
State Superseded
Headers show
Series
  • purge free_shared_domheap_page()
Related show

Commit Message

Paul Durrant Jan. 21, 2020, noon UTC
vmx_alloc_vlapic_mapping() currently contains some very odd looking code
that allocates a MEMF_no_owner domheap page and then shares with the guest
as if it were a xenheap page. This then requires vmx_free_vlapic_mapping()
to call a special function in the mm code: free_shared_domheap_page().

By using a 'normal' domheap page (i.e. by not passing MEMF_no_owner to
alloc_domheap_page()), the odd looking code in vmx_alloc_vlapic_mapping()
can simply use get_page_and_type() to set up a writable mapping before
insertion in the P2M and vmx_free_vlapic_mapping() can simply release the
page using put_page_alloc_ref() followed by put_page_and_type(). This
then allows free_shared_domheap_page() to be purged.

There is, however, some fall-out from this simplification:

- alloc_domheap_page() will now call assign_pages() and run into the fact
  that 'max_pages' is not set until some time after domain_create(). To
  avoid an allocation failure, assign_pages() is modified to ignore the
  max_pages limit if 'creation_finished' is false. That value is not set
  to true until domain_unpause_by_systemcontroller() is called, and thus
  the guest cannot run (and hence cause memory allocation) until
  creation_finished is set to true.

- Because the domheap page is no longer a pseudo-xenheap page, the
  reference counting will prevent the domain from being destroyed. Thus
  the call to vmx_free_vlapic_mapping() is moved from the
  domain_destroy() method into the domain_relinquish_resources() method.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wl@xen.org>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Julien Grall <julien@xen.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
---
 xen/arch/x86/hvm/vmx/vmx.c | 29 ++++++++++++++++++++++-------
 xen/arch/x86/mm.c          | 10 ----------
 xen/common/page_alloc.c    |  3 ++-
 xen/include/asm-x86/mm.h   |  2 --
 4 files changed, 24 insertions(+), 20 deletions(-)

Comments

Julien Grall Jan. 21, 2020, 12:29 p.m. UTC | #1
Hi,

On 21/01/2020 12:00, Paul Durrant wrote:
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 919a270587..ef327072ed 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -2269,7 +2269,8 @@ int assign_pages(
>   
>       if ( !(memflags & MEMF_no_refcount) )
>       {
> -        if ( unlikely((d->tot_pages + (1 << order)) > d->max_pages) )
> +        if ( unlikely((d->tot_pages + (1 << order)) > d->max_pages) &&
> +             d->creation_finished )

This is not entirely obvious to me how this is safe. What would happen 
if d->creation_finished is set on another CPU at the same time? At least 
on Arm, this may not be seen directly.

I guess the problem would not only happen in this use case (I am more 
concerned in the physmap code), but it would be good to document how it 
is meant to be safe to use.

However, AFAIU, the only reason for the check to be here is because 
d->max_pages is set quite late. How about setting max_pages as part of 
the domain_create hypercall?

Cheers,
Durrant, Paul Jan. 21, 2020, 12:37 p.m. UTC | #2
> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of
> Julien Grall
> Sent: 21 January 2020 12:29
> To: Durrant, Paul <pdurrant@amazon.co.uk>; xen-devel@lists.xenproject.org
> Cc: Kevin Tian <kevin.tian@intel.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Jun Nakajima <jun.nakajima@intel.com>; Wei Liu
> <wl@xen.org>; Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; George
> Dunlap <George.Dunlap@eu.citrix.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>;
> Roger Pau Monné <roger.pau@citrix.com>
> Subject: Re: [Xen-devel] [PATCH 3/3] x86 / vmx: use a 'normal' domheap
> page for APIC_DEFAULT_PHYS_BASE
> 
> Hi,
> 
> On 21/01/2020 12:00, Paul Durrant wrote:
> > diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> > index 919a270587..ef327072ed 100644
> > --- a/xen/common/page_alloc.c
> > +++ b/xen/common/page_alloc.c
> > @@ -2269,7 +2269,8 @@ int assign_pages(
> >
> >       if ( !(memflags & MEMF_no_refcount) )
> >       {
> > -        if ( unlikely((d->tot_pages + (1 << order)) > d->max_pages) )
> > +        if ( unlikely((d->tot_pages + (1 << order)) > d->max_pages) &&
> > +             d->creation_finished )
> 
> This is not entirely obvious to me how this is safe. What would happen
> if d->creation_finished is set on another CPU at the same time? At least
> on Arm, this may not be seen directly.
> 
> I guess the problem would not only happen in this use case (I am more
> concerned in the physmap code), but it would be good to document how it
> is meant to be safe to use.
> 
> However, AFAIU, the only reason for the check to be here is because
> d->max_pages is set quite late. How about setting max_pages as part of
> the domain_create hypercall?

That would be useful but certainly more invasive. There's no way a guest vcpu can see creation_finished set to true as it is still paused. The only concern would be a stub domain causing domheap pages to be allocated on behalf of the guest, and can we not trust a stub domain until it's guest has been unpaused (since there is no scope for the guest to attack it until then)?

  Paul

> 
> Cheers,
> 
> --
> Julien Grall
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
Tian, Kevin Jan. 22, 2020, 3:19 a.m. UTC | #3
> From: Paul Durrant <pdurrant@amazon.com>
> Sent: Tuesday, January 21, 2020 8:00 PM
> 
> vmx_alloc_vlapic_mapping() currently contains some very odd looking code
> that allocates a MEMF_no_owner domheap page and then shares with the
> guest
> as if it were a xenheap page. This then requires vmx_free_vlapic_mapping()
> to call a special function in the mm code: free_shared_domheap_page().
> 
> By using a 'normal' domheap page (i.e. by not passing MEMF_no_owner to
> alloc_domheap_page()), the odd looking code in vmx_alloc_vlapic_mapping()
> can simply use get_page_and_type() to set up a writable mapping before
> insertion in the P2M and vmx_free_vlapic_mapping() can simply release the
> page using put_page_alloc_ref() followed by put_page_and_type(). This
> then allows free_shared_domheap_page() to be purged.

I doubt whether using a normal domheap page is a right thing in concept.
Note the APIC access page is the backend for virtual LAPIC_BASE which is 
a MMIO range from guest p.o.v. 

> 
> There is, however, some fall-out from this simplification:
> 
> - alloc_domheap_page() will now call assign_pages() and run into the fact
>   that 'max_pages' is not set until some time after domain_create(). To
>   avoid an allocation failure, assign_pages() is modified to ignore the
>   max_pages limit if 'creation_finished' is false. That value is not set
>   to true until domain_unpause_by_systemcontroller() is called, and thus
>   the guest cannot run (and hence cause memory allocation) until
>   creation_finished is set to true.

However, doing so opens the window of no guard of allocations in 
the whole VM creation time. I'm not sure whether it's a worthwhile
just for fixing a bad-looking but conceptually-correct code.

> 
> - Because the domheap page is no longer a pseudo-xenheap page, the
>   reference counting will prevent the domain from being destroyed. Thus
>   the call to vmx_free_vlapic_mapping() is moved from the
>   domain_destroy() method into the domain_relinquish_resources() method.
> 
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> ---
> Cc: Jun Nakajima <jun.nakajima@intel.com>
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Wei Liu <wl@xen.org>
> Cc: "Roger Pau Monné" <roger.pau@citrix.com>
> Cc: George Dunlap <George.Dunlap@eu.citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Julien Grall <julien@xen.org>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> ---
>  xen/arch/x86/hvm/vmx/vmx.c | 29 ++++++++++++++++++++++-------
>  xen/arch/x86/mm.c          | 10 ----------
>  xen/common/page_alloc.c    |  3 ++-
>  xen/include/asm-x86/mm.h   |  2 --
>  4 files changed, 24 insertions(+), 20 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 3fd3ac61e1..a2e6081485 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -421,10 +421,6 @@ static int vmx_domain_initialise(struct domain *d)
>  }
> 
>  static void vmx_domain_relinquish_resources(struct domain *d)
> -{
> -}
> -
> -static void vmx_domain_destroy(struct domain *d)
>  {
>      if ( !has_vlapic(d) )
>          return;
> @@ -432,6 +428,10 @@ static void vmx_domain_destroy(struct domain *d)
>      vmx_free_vlapic_mapping(d);
>  }
> 
> +static void vmx_domain_destroy(struct domain *d)
> +{
> +}
> +
>  static int vmx_vcpu_initialise(struct vcpu *v)
>  {
>      int rc;
> @@ -3034,12 +3034,22 @@ static int vmx_alloc_vlapic_mapping(struct
> domain *d)
>      if ( !cpu_has_vmx_virtualize_apic_accesses )
>          return 0;
> 
> -    pg = alloc_domheap_page(d, MEMF_no_owner);
> +    pg = alloc_domheap_page(d, 0);
>      if ( !pg )
>          return -ENOMEM;
> +
> +    if ( !get_page_and_type(pg, d, PGT_writable_page) )
> +    {
> +        /*
> +         * The domain can't possibly know about this page yet, so failure
> +         * here is a clear indication of something fishy going on.
> +         */
> +        domain_crash(d);
> +        return -ENODATA;
> +    }
> +
>      mfn = page_to_mfn(pg);
>      clear_domain_page(mfn);
> -    share_xen_page_with_guest(pg, d, SHARE_rw);
>      d->arch.hvm.vmx.apic_access_mfn = mfn;
> 
>      return set_mmio_p2m_entry(d,
> paddr_to_pfn(APIC_DEFAULT_PHYS_BASE), mfn,
> @@ -3052,7 +3062,12 @@ static void vmx_free_vlapic_mapping(struct
> domain *d)
>      mfn_t mfn = d->arch.hvm.vmx.apic_access_mfn;
> 
>      if ( !mfn_eq(mfn, INVALID_MFN) )
> -        free_shared_domheap_page(mfn_to_page(mfn));
> +    {
> +        struct page_info *pg = mfn_to_page(mfn);
> +
> +        put_page_alloc_ref(pg);
> +        put_page_and_type(pg);
> +    }
>  }
> 
>  static void vmx_install_vlapic_mapping(struct vcpu *v)
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index 654190e9e9..2a6d2e8af9 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -496,16 +496,6 @@ void share_xen_page_with_guest(struct page_info
> *page, struct domain *d,
>      spin_unlock(&d->page_alloc_lock);
>  }
> 
> -void free_shared_domheap_page(struct page_info *page)
> -{
> -    put_page_alloc_ref(page);
> -    if ( !test_and_clear_bit(_PGC_xen_heap, &page->count_info) )
> -        ASSERT_UNREACHABLE();
> -    page->u.inuse.type_info = 0;
> -    page_set_owner(page, NULL);
> -    free_domheap_page(page);
> -}
> -
>  void make_cr3(struct vcpu *v, mfn_t mfn)
>  {
>      struct domain *d = v->domain;
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 919a270587..ef327072ed 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -2269,7 +2269,8 @@ int assign_pages(
> 
>      if ( !(memflags & MEMF_no_refcount) )
>      {
> -        if ( unlikely((d->tot_pages + (1 << order)) > d->max_pages) )
> +        if ( unlikely((d->tot_pages + (1 << order)) > d->max_pages) &&
> +             d->creation_finished )
>          {
>              gprintk(XENLOG_INFO, "Over-allocation for domain %u: "
>                      "%u > %u\n", d->domain_id,
> diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
> index 2ca8882ad0..e429f38228 100644
> --- a/xen/include/asm-x86/mm.h
> +++ b/xen/include/asm-x86/mm.h
> @@ -317,8 +317,6 @@ struct page_info
> 
>  #define maddr_get_owner(ma)   (page_get_owner(maddr_to_page((ma))))
> 
> -extern void free_shared_domheap_page(struct page_info *page);
> -
>  #define frame_table ((struct page_info *)FRAMETABLE_VIRT_START)
>  extern unsigned long max_page;
>  extern unsigned long total_pages;
> --
> 2.20.1
Durrant, Paul Jan. 22, 2020, 11:25 a.m. UTC | #4
> -----Original Message-----
> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: 22 January 2020 03:19
> To: Durrant, Paul <pdurrant@amazon.co.uk>; xen-devel@lists.xenproject.org
> Cc: Nakajima, Jun <jun.nakajima@intel.com>; Jan Beulich
> <jbeulich@suse.com>; Andrew Cooper <andrew.cooper3@citrix.com>; Wei Liu
> <wl@xen.org>; Roger Pau Monné <roger.pau@citrix.com>; George Dunlap
> <George.Dunlap@eu.citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>;
> Julien Grall <julien@xen.org>; Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com>; Stefano Stabellini <sstabellini@kernel.org>
> Subject: RE: [PATCH 3/3] x86 / vmx: use a 'normal' domheap page for
> APIC_DEFAULT_PHYS_BASE
> 
> > From: Paul Durrant <pdurrant@amazon.com>
> > Sent: Tuesday, January 21, 2020 8:00 PM
> >
> > vmx_alloc_vlapic_mapping() currently contains some very odd looking code
> > that allocates a MEMF_no_owner domheap page and then shares with the
> > guest
> > as if it were a xenheap page. This then requires
> vmx_free_vlapic_mapping()
> > to call a special function in the mm code: free_shared_domheap_page().
> >
> > By using a 'normal' domheap page (i.e. by not passing MEMF_no_owner to
> > alloc_domheap_page()), the odd looking code in
> vmx_alloc_vlapic_mapping()
> > can simply use get_page_and_type() to set up a writable mapping before
> > insertion in the P2M and vmx_free_vlapic_mapping() can simply release
> the
> > page using put_page_alloc_ref() followed by put_page_and_type(). This
> > then allows free_shared_domheap_page() to be purged.
> 
> I doubt whether using a normal domheap page is a right thing in concept.
> Note the APIC access page is the backend for virtual LAPIC_BASE which is
> a MMIO range from guest p.o.v.

Well, using a non-owned domheap page as a fake xenheap page just looks weird. Also 'special' code like this is historically a source of XSAs. IMO a straightforward domheap page here is totally appropriate; AFAICT the code is only as it is to work round the max_pages limit and the lack of a relinquish_resources() hvmfunc.

> 
> >
> > There is, however, some fall-out from this simplification:
> >
> > - alloc_domheap_page() will now call assign_pages() and run into the
> fact
> >   that 'max_pages' is not set until some time after domain_create(). To
> >   avoid an allocation failure, assign_pages() is modified to ignore the
> >   max_pages limit if 'creation_finished' is false. That value is not set
> >   to true until domain_unpause_by_systemcontroller() is called, and thus
> >   the guest cannot run (and hence cause memory allocation) until
> >   creation_finished is set to true.
> 
> However, doing so opens the window of no guard of allocations in
> the whole VM creation time. I'm not sure whether it's a worthwhile
> just for fixing a bad-looking but conceptually-correct code.
> 

I don’t think the code is conceptually correct. The lack of guard is only for pages assigned to that domain, and since the domain cannot possibly be running then I don't see it is a problem. That said, an alternative would be to have the domain creation code to set max_pages is to a minimal initial value sufficient to cover any alloc_domheap_pages() calls done by sub-functions.

  Paul

> >
> > - Because the domheap page is no longer a pseudo-xenheap page, the
> >   reference counting will prevent the domain from being destroyed. Thus
> >   the call to vmx_free_vlapic_mapping() is moved from the
> >   domain_destroy() method into the domain_relinquish_resources() method.
> >
> > Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> > ---
> > Cc: Jun Nakajima <jun.nakajima@intel.com>
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Jan Beulich <jbeulich@suse.com>
> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> > Cc: Wei Liu <wl@xen.org>
> > Cc: "Roger Pau Monné" <roger.pau@citrix.com>
> > Cc: George Dunlap <George.Dunlap@eu.citrix.com>
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Julien Grall <julien@xen.org>
> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Cc: Stefano Stabellini <sstabellini@kernel.org>
> > ---
> >  xen/arch/x86/hvm/vmx/vmx.c | 29 ++++++++++++++++++++++-------
> >  xen/arch/x86/mm.c          | 10 ----------
> >  xen/common/page_alloc.c    |  3 ++-
> >  xen/include/asm-x86/mm.h   |  2 --
> >  4 files changed, 24 insertions(+), 20 deletions(-)
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > index 3fd3ac61e1..a2e6081485 100644
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -421,10 +421,6 @@ static int vmx_domain_initialise(struct domain *d)
> >  }
> >
> >  static void vmx_domain_relinquish_resources(struct domain *d)
> > -{
> > -}
> > -
> > -static void vmx_domain_destroy(struct domain *d)
> >  {
> >      if ( !has_vlapic(d) )
> >          return;
> > @@ -432,6 +428,10 @@ static void vmx_domain_destroy(struct domain *d)
> >      vmx_free_vlapic_mapping(d);
> >  }
> >
> > +static void vmx_domain_destroy(struct domain *d)
> > +{
> > +}
> > +
> >  static int vmx_vcpu_initialise(struct vcpu *v)
> >  {
> >      int rc;
> > @@ -3034,12 +3034,22 @@ static int vmx_alloc_vlapic_mapping(struct
> > domain *d)
> >      if ( !cpu_has_vmx_virtualize_apic_accesses )
> >          return 0;
> >
> > -    pg = alloc_domheap_page(d, MEMF_no_owner);
> > +    pg = alloc_domheap_page(d, 0);
> >      if ( !pg )
> >          return -ENOMEM;
> > +
> > +    if ( !get_page_and_type(pg, d, PGT_writable_page) )
> > +    {
> > +        /*
> > +         * The domain can't possibly know about this page yet, so
> failure
> > +         * here is a clear indication of something fishy going on.
> > +         */
> > +        domain_crash(d);
> > +        return -ENODATA;
> > +    }
> > +
> >      mfn = page_to_mfn(pg);
> >      clear_domain_page(mfn);
> > -    share_xen_page_with_guest(pg, d, SHARE_rw);
> >      d->arch.hvm.vmx.apic_access_mfn = mfn;
> >
> >      return set_mmio_p2m_entry(d,
> > paddr_to_pfn(APIC_DEFAULT_PHYS_BASE), mfn,
> > @@ -3052,7 +3062,12 @@ static void vmx_free_vlapic_mapping(struct
> > domain *d)
> >      mfn_t mfn = d->arch.hvm.vmx.apic_access_mfn;
> >
> >      if ( !mfn_eq(mfn, INVALID_MFN) )
> > -        free_shared_domheap_page(mfn_to_page(mfn));
> > +    {
> > +        struct page_info *pg = mfn_to_page(mfn);
> > +
> > +        put_page_alloc_ref(pg);
> > +        put_page_and_type(pg);
> > +    }
> >  }
> >
> >  static void vmx_install_vlapic_mapping(struct vcpu *v)
> > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> > index 654190e9e9..2a6d2e8af9 100644
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -496,16 +496,6 @@ void share_xen_page_with_guest(struct page_info
> > *page, struct domain *d,
> >      spin_unlock(&d->page_alloc_lock);
> >  }
> >
> > -void free_shared_domheap_page(struct page_info *page)
> > -{
> > -    put_page_alloc_ref(page);
> > -    if ( !test_and_clear_bit(_PGC_xen_heap, &page->count_info) )
> > -        ASSERT_UNREACHABLE();
> > -    page->u.inuse.type_info = 0;
> > -    page_set_owner(page, NULL);
> > -    free_domheap_page(page);
> > -}
> > -
> >  void make_cr3(struct vcpu *v, mfn_t mfn)
> >  {
> >      struct domain *d = v->domain;
> > diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> > index 919a270587..ef327072ed 100644
> > --- a/xen/common/page_alloc.c
> > +++ b/xen/common/page_alloc.c
> > @@ -2269,7 +2269,8 @@ int assign_pages(
> >
> >      if ( !(memflags & MEMF_no_refcount) )
> >      {
> > -        if ( unlikely((d->tot_pages + (1 << order)) > d->max_pages) )
> > +        if ( unlikely((d->tot_pages + (1 << order)) > d->max_pages) &&
> > +             d->creation_finished )
> >          {
> >              gprintk(XENLOG_INFO, "Over-allocation for domain %u: "
> >                      "%u > %u\n", d->domain_id,
> > diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
> > index 2ca8882ad0..e429f38228 100644
> > --- a/xen/include/asm-x86/mm.h
> > +++ b/xen/include/asm-x86/mm.h
> > @@ -317,8 +317,6 @@ struct page_info
> >
> >  #define maddr_get_owner(ma)   (page_get_owner(maddr_to_page((ma))))
> >
> > -extern void free_shared_domheap_page(struct page_info *page);
> > -
> >  #define frame_table ((struct page_info *)FRAMETABLE_VIRT_START)
> >  extern unsigned long max_page;
> >  extern unsigned long total_pages;
> > --
> > 2.20.1
Jan Beulich Jan. 22, 2020, 4:17 p.m. UTC | #5
On 21.01.2020 13:00, Paul Durrant wrote:
> vmx_alloc_vlapic_mapping() currently contains some very odd looking code
> that allocates a MEMF_no_owner domheap page and then shares with the guest
> as if it were a xenheap page. This then requires vmx_free_vlapic_mapping()
> to call a special function in the mm code: free_shared_domheap_page().
> 
> By using a 'normal' domheap page (i.e. by not passing MEMF_no_owner to
> alloc_domheap_page()), the odd looking code in vmx_alloc_vlapic_mapping()
> can simply use get_page_and_type() to set up a writable mapping before
> insertion in the P2M and vmx_free_vlapic_mapping() can simply release the
> page using put_page_alloc_ref() followed by put_page_and_type(). This
> then allows free_shared_domheap_page() to be purged.
> 
> There is, however, some fall-out from this simplification:
> 
> - alloc_domheap_page() will now call assign_pages() and run into the fact
>   that 'max_pages' is not set until some time after domain_create(). To
>   avoid an allocation failure, assign_pages() is modified to ignore the
>   max_pages limit if 'creation_finished' is false. That value is not set
>   to true until domain_unpause_by_systemcontroller() is called, and thus
>   the guest cannot run (and hence cause memory allocation) until
>   creation_finished is set to true.

But this check is also to guard against the tool stack (or possibly
the controlling stubdom) to cause excess allocation. I don't think
the checking should be undermined like this (and see also below).

Since certainly you've looked into this while creating the patch,
could you remind me why it is that this page needs to be owned (as
in its owner field set accordingly) by the guest? It's a helper
page only, after all.

> @@ -3034,12 +3034,22 @@ static int vmx_alloc_vlapic_mapping(struct domain *d)
>      if ( !cpu_has_vmx_virtualize_apic_accesses )
>          return 0;
>  
> -    pg = alloc_domheap_page(d, MEMF_no_owner);
> +    pg = alloc_domheap_page(d, 0);

Did you consider passing MEMF_no_refcount here, to avoid the
fiddling with assign_pages()? That'll in particular also
avoid ...

> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -2269,7 +2269,8 @@ int assign_pages(
>  
>      if ( !(memflags & MEMF_no_refcount) )
>      {
> -        if ( unlikely((d->tot_pages + (1 << order)) > d->max_pages) )
> +        if ( unlikely((d->tot_pages + (1 << order)) > d->max_pages) &&
> +             d->creation_finished )
>          {
>              gprintk(XENLOG_INFO, "Over-allocation for domain %u: "
>                      "%u > %u\n", d->domain_id,

... invoking domain_adjust_tot_pages() right below here, which
is wrong for helper pages like this one (as it reduces the
amount the domain is actually permitted to allocate).

Jan
Durrant, Paul Jan. 22, 2020, 4:27 p.m. UTC | #6
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 22 January 2020 16:17
> To: Durrant, Paul <pdurrant@amazon.co.uk>
> Cc: xen-devel@lists.xenproject.org; Jun Nakajima <jun.nakajima@intel.com>;
> Kevin Tian <kevin.tian@intel.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Wei Liu <wl@xen.org>; Roger Pau Monné
> <roger.pau@citrix.com>; George Dunlap <George.Dunlap@eu.citrix.com>; Ian
> Jackson <ian.jackson@eu.citrix.com>; Julien Grall <julien@xen.org>; Konrad
> Rzeszutek Wilk <konrad.wilk@oracle.com>; Stefano Stabellini
> <sstabellini@kernel.org>
> Subject: Re: [PATCH 3/3] x86 / vmx: use a 'normal' domheap page for
> APIC_DEFAULT_PHYS_BASE
> 
> On 21.01.2020 13:00, Paul Durrant wrote:
> > vmx_alloc_vlapic_mapping() currently contains some very odd looking code
> > that allocates a MEMF_no_owner domheap page and then shares with the
> guest
> > as if it were a xenheap page. This then requires
> vmx_free_vlapic_mapping()
> > to call a special function in the mm code: free_shared_domheap_page().
> >
> > By using a 'normal' domheap page (i.e. by not passing MEMF_no_owner to
> > alloc_domheap_page()), the odd looking code in
> vmx_alloc_vlapic_mapping()
> > can simply use get_page_and_type() to set up a writable mapping before
> > insertion in the P2M and vmx_free_vlapic_mapping() can simply release
> the
> > page using put_page_alloc_ref() followed by put_page_and_type(). This
> > then allows free_shared_domheap_page() to be purged.
> >
> > There is, however, some fall-out from this simplification:
> >
> > - alloc_domheap_page() will now call assign_pages() and run into the
> fact
> >   that 'max_pages' is not set until some time after domain_create(). To
> >   avoid an allocation failure, assign_pages() is modified to ignore the
> >   max_pages limit if 'creation_finished' is false. That value is not set
> >   to true until domain_unpause_by_systemcontroller() is called, and thus
> >   the guest cannot run (and hence cause memory allocation) until
> >   creation_finished is set to true.
> 
> But this check is also to guard against the tool stack (or possibly
> the controlling stubdom) to cause excess allocation. I don't think
> the checking should be undermined like this (and see also below).
>

Ok.
 
> Since certainly you've looked into this while creating the patch,
> could you remind me why it is that this page needs to be owned (as
> in its owner field set accordingly) by the guest? It's a helper
> page only, after all.
> 

Not sure why it was done that way. It's inserted into the guest P2M so having it owned by the guest seems like the right thing to do. A malicious guest could decrease-reservation it and I guess it avoids special-casing there.

> > @@ -3034,12 +3034,22 @@ static int vmx_alloc_vlapic_mapping(struct
> domain *d)
> >      if ( !cpu_has_vmx_virtualize_apic_accesses )
> >          return 0;
> >
> > -    pg = alloc_domheap_page(d, MEMF_no_owner);
> > +    pg = alloc_domheap_page(d, 0);
> 
> Did you consider passing MEMF_no_refcount here, to avoid the
> fiddling with assign_pages()? That'll in particular also
> avoid ...
> 

You remember what happened last time we did that (with the ioreq server page), right? That's why assign_pages() vetoes non-refcounted pages.

> > --- a/xen/common/page_alloc.c
> > +++ b/xen/common/page_alloc.c
> > @@ -2269,7 +2269,8 @@ int assign_pages(
> >
> >      if ( !(memflags & MEMF_no_refcount) )
> >      {
> > -        if ( unlikely((d->tot_pages + (1 << order)) > d->max_pages) )
> > +        if ( unlikely((d->tot_pages + (1 << order)) > d->max_pages) &&
> > +             d->creation_finished )
> >          {
> >              gprintk(XENLOG_INFO, "Over-allocation for domain %u: "
> >                      "%u > %u\n", d->domain_id,
> 
> ... invoking domain_adjust_tot_pages() right below here, which
> is wrong for helper pages like this one (as it reduces the
> amount the domain is actually permitted to allocate).
> 

True, but there is 'slop' to deal with things like the ioreq pages and I think this page is logically similar.

  Paul

Patch
diff mbox series

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 3fd3ac61e1..a2e6081485 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -421,10 +421,6 @@  static int vmx_domain_initialise(struct domain *d)
 }
 
 static void vmx_domain_relinquish_resources(struct domain *d)
-{
-}
-
-static void vmx_domain_destroy(struct domain *d)
 {
     if ( !has_vlapic(d) )
         return;
@@ -432,6 +428,10 @@  static void vmx_domain_destroy(struct domain *d)
     vmx_free_vlapic_mapping(d);
 }
 
+static void vmx_domain_destroy(struct domain *d)
+{
+}
+
 static int vmx_vcpu_initialise(struct vcpu *v)
 {
     int rc;
@@ -3034,12 +3034,22 @@  static int vmx_alloc_vlapic_mapping(struct domain *d)
     if ( !cpu_has_vmx_virtualize_apic_accesses )
         return 0;
 
-    pg = alloc_domheap_page(d, MEMF_no_owner);
+    pg = alloc_domheap_page(d, 0);
     if ( !pg )
         return -ENOMEM;
+
+    if ( !get_page_and_type(pg, d, PGT_writable_page) )
+    {
+        /*
+         * The domain can't possibly know about this page yet, so failure
+         * here is a clear indication of something fishy going on.
+         */
+        domain_crash(d);
+        return -ENODATA;
+    }
+
     mfn = page_to_mfn(pg);
     clear_domain_page(mfn);
-    share_xen_page_with_guest(pg, d, SHARE_rw);
     d->arch.hvm.vmx.apic_access_mfn = mfn;
 
     return set_mmio_p2m_entry(d, paddr_to_pfn(APIC_DEFAULT_PHYS_BASE), mfn,
@@ -3052,7 +3062,12 @@  static void vmx_free_vlapic_mapping(struct domain *d)
     mfn_t mfn = d->arch.hvm.vmx.apic_access_mfn;
 
     if ( !mfn_eq(mfn, INVALID_MFN) )
-        free_shared_domheap_page(mfn_to_page(mfn));
+    {
+        struct page_info *pg = mfn_to_page(mfn);
+
+        put_page_alloc_ref(pg);
+        put_page_and_type(pg);
+    }
 }
 
 static void vmx_install_vlapic_mapping(struct vcpu *v)
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 654190e9e9..2a6d2e8af9 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -496,16 +496,6 @@  void share_xen_page_with_guest(struct page_info *page, struct domain *d,
     spin_unlock(&d->page_alloc_lock);
 }
 
-void free_shared_domheap_page(struct page_info *page)
-{
-    put_page_alloc_ref(page);
-    if ( !test_and_clear_bit(_PGC_xen_heap, &page->count_info) )
-        ASSERT_UNREACHABLE();
-    page->u.inuse.type_info = 0;
-    page_set_owner(page, NULL);
-    free_domheap_page(page);
-}
-
 void make_cr3(struct vcpu *v, mfn_t mfn)
 {
     struct domain *d = v->domain;
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 919a270587..ef327072ed 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -2269,7 +2269,8 @@  int assign_pages(
 
     if ( !(memflags & MEMF_no_refcount) )
     {
-        if ( unlikely((d->tot_pages + (1 << order)) > d->max_pages) )
+        if ( unlikely((d->tot_pages + (1 << order)) > d->max_pages) &&
+             d->creation_finished )
         {
             gprintk(XENLOG_INFO, "Over-allocation for domain %u: "
                     "%u > %u\n", d->domain_id,
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 2ca8882ad0..e429f38228 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -317,8 +317,6 @@  struct page_info
 
 #define maddr_get_owner(ma)   (page_get_owner(maddr_to_page((ma))))
 
-extern void free_shared_domheap_page(struct page_info *page);
-
 #define frame_table ((struct page_info *)FRAMETABLE_VIRT_START)
 extern unsigned long max_page;
 extern unsigned long total_pages;