diff mbox

[v2,REPOST,12/12] x86/hvm/ioreq: add a new mappable resource type...

Message ID 20170822145107.6877-13-paul.durrant@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paul Durrant Aug. 22, 2017, 2:51 p.m. UTC
... XENMEM_resource_ioreq_server

This patch adds support for a new resource type that can be mapped using
the XENMEM_acquire_resource memory op.

If an emulator makes use of this resource type then, instead of mapping
gfns, the IOREQ server will allocate pages from the heap. These pages
will never be present in the P2M of the guest at any point and so are
not vulnerable to any direct attack by the guest. They are only ever
accessible by Xen and any domain that has mapping privilege over the
guest (which may or may not be limited to the domain running the emulator).

NOTE: Use of the new resource type is not compatible with use of
      XEN_DMOP_get_ioreq_server_info unless the XEN_DMOP_no_gfns flag is
      set.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/hvm/ioreq.c        | 136 ++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/mm.c               |  27 ++++++++
 xen/include/asm-x86/hvm/ioreq.h |   2 +
 xen/include/public/hvm/dm_op.h  |   4 ++
 xen/include/public/memory.h     |   3 +
 5 files changed, 172 insertions(+)

Comments

Roger Pau Monné Aug. 25, 2017, 9:32 a.m. UTC | #1
On Tue, Aug 22, 2017 at 03:51:06PM +0100, Paul Durrant wrote:
> ... XENMEM_resource_ioreq_server
> 
> This patch adds support for a new resource type that can be mapped using
> the XENMEM_acquire_resource memory op.
> 
> If an emulator makes use of this resource type then, instead of mapping
> gfns, the IOREQ server will allocate pages from the heap. These pages
> will never be present in the P2M of the guest at any point and so are
> not vulnerable to any direct attack by the guest. They are only ever
> accessible by Xen and any domain that has mapping privilege over the
> guest (which may or may not be limited to the domain running the emulator).
> 
> NOTE: Use of the new resource type is not compatible with use of
>       XEN_DMOP_get_ioreq_server_info unless the XEN_DMOP_no_gfns flag is
>       set.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: George Dunlap <George.Dunlap@eu.citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Tim Deegan <tim@xen.org>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
>  xen/arch/x86/hvm/ioreq.c        | 136 ++++++++++++++++++++++++++++++++++++++++
>  xen/arch/x86/mm.c               |  27 ++++++++
>  xen/include/asm-x86/hvm/ioreq.h |   2 +
>  xen/include/public/hvm/dm_op.h  |   4 ++
>  xen/include/public/memory.h     |   3 +
>  5 files changed, 172 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
> index 795c198f95..9e6838dab6 100644
> --- a/xen/arch/x86/hvm/ioreq.c
> +++ b/xen/arch/x86/hvm/ioreq.c
> @@ -231,6 +231,15 @@ static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
>      struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
>      int rc;
>  
> +    if ( iorp->page )
> +    {
> +        /* Make sure the page has not been allocated */
> +        if ( gfn_eq(iorp->gfn, INVALID_GFN) )
> +            return -EPERM;
> +
> +        return 0;

EEXIST? (See comment below, which I think also applies here).

> +    }
> +
>      if ( d->is_dying )
>          return -EINVAL;
>  
> @@ -253,6 +262,60 @@ static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
>      return rc;
>  }
>  
> +static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
> +{
> +    struct domain *currd = current->domain;
> +    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
> +
> +    if ( iorp->page )
> +    {
> +        /* Make sure the page has not been mapped */
> +        if ( !gfn_eq(iorp->gfn, INVALID_GFN) )
> +            return -EPERM;
> +
> +        return 0;

Shouldn't this return EEXIST? Page has already been allocated by a
previous call AFAICT, and it seems like a possible error/misbehavior
to try to do it twice.

> +    }
> +
> +    /*
> +     * Allocated IOREQ server pages are assigned to the emulating
> +     * domain, not the target domain. This is because the emulator is
> +     * likely to be destroyed after the target domain has been torn
> +     * down, and we must use MEMF_no_refcount otherwise page allocation
> +     * could fail if the emulating domain has already reached its
> +     * maximum allocation.
> +     */
> +    iorp->page = alloc_domheap_page(currd, MEMF_no_refcount);

I don't really like the fact that the page is not accounted for any
domain, but I can see the point in doing it like that (which you
argument in the comment).

IIRC there where talks about tightening the accounting of memory
pages, so that ideally everything would be accounted for in the memory
assigned to the domain.

Just some random through, but could the toolstack set aside some
memory pages (ie: not map them into the domain p2m), that could then
be used by this? (not asking you to do this here)

And how many pages are we expecting to use for each domain? I assume
the number will be quite low.

> +    if ( !iorp->page )
> +        return -ENOMEM;
> +
> +    get_page(iorp->page, currd);

Do you really need this get_page? (page is already assigned to currd).

> +
> +    iorp->va = __map_domain_page_global(iorp->page);
> +    if ( !iorp->va )
> +    {
> +        put_page(iorp->page);
> +        iorp->page = NULL;
> +        return -ENOMEM;
> +    }
> +
> +    clear_page(iorp->va);
> +    return 0;
> +}
> +
> +static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
> +{
> +    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
> +
> +    if ( !iorp->page )
> +        return;
> +
> +    unmap_domain_page_global(iorp->va);
> +    iorp->va = NULL;
> +
> +    put_page(iorp->page);
> +    iorp->page = NULL;
> +}
> +
>  bool is_ioreq_server_page(struct domain *d, const struct page_info *page)
>  {
>      const struct hvm_ioreq_server *s;
> @@ -457,6 +520,27 @@ static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
>      hvm_unmap_ioreq_gfn(s, false);
>  }
>  
> +static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s)
> +{
> +    int rc = -ENOMEM;
> +
> +    rc = hvm_alloc_ioreq_mfn(s, false);
> +
> +    if ( !rc && (s->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) )
> +        rc = hvm_alloc_ioreq_mfn(s, true);
> +
> +    if ( rc )
> +        hvm_free_ioreq_mfn(s, false);
> +
> +    return rc;
> +}
> +
> +static void hvm_ioreq_server_free_pages(struct hvm_ioreq_server *s)
> +{
> +    hvm_free_ioreq_mfn(s, true);
> +    hvm_free_ioreq_mfn(s, false);
> +}
> +
>  static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s)
>  {
>      unsigned int i;
> @@ -583,7 +667,18 @@ static int hvm_ioreq_server_init(struct hvm_ioreq_server *s,
>  
>   fail_add:
>      hvm_ioreq_server_remove_all_vcpus(s);
> +
> +    /*
> +     * NOTE: It is safe to call both hvm_ioreq_server_unmap_pages() and
> +     *       hvm_ioreq_server_free_pages() in that order.
> +     *       This is because the former will do nothing if the pages
> +     *       are not mapped, leaving the page to be freed by the latter.
> +     *       However if the pages are mapped then the former will set
> +     *       the page_info pointer to NULL, meaning the latter will do
> +     *       nothing.
> +     */
>      hvm_ioreq_server_unmap_pages(s);
> +    hvm_ioreq_server_free_pages(s);
>  
>      return rc;
>  }
> @@ -593,6 +688,7 @@ static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s)
>      ASSERT(!s->enabled);
>      hvm_ioreq_server_remove_all_vcpus(s);
>      hvm_ioreq_server_unmap_pages(s);
> +    hvm_ioreq_server_free_pages(s);
>      hvm_ioreq_server_free_rangesets(s);
>  }
>  
> @@ -745,6 +841,9 @@ int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
>              rc = hvm_ioreq_server_map_pages(s);
>              if ( rc )
>                  break;
> +
> +            gdprintk(XENLOG_INFO, "d%d ioreq server %u using gfns\n",
> +                     d->domain_id, s->id);
>          }
>  
>          if ( ioreq_gfn )
> @@ -767,6 +866,43 @@ int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
>      return rc;
>  }
>  
> +mfn_t hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
> +                                 unsigned int idx)
> +{
> +    struct hvm_ioreq_server *s;
> +    mfn_t mfn = INVALID_MFN;
> +
> +    spin_lock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
> +
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server.list,
> +                          list_entry )
> +    {
> +        int rc;
> +
> +        if ( s == d->arch.hvm_domain.default_ioreq_server )
> +            continue;

s->is_default

> +
> +        if ( s->id != id )
> +            continue;
> +
> +        rc = hvm_ioreq_server_alloc_pages(s);
> +        if ( rc )
> +            break;
> +
> +        if ( idx == 0 )
> +            mfn = _mfn(page_to_mfn(s->bufioreq.page));
> +        else if ( idx == 1 )
> +            mfn = _mfn(page_to_mfn(s->ioreq.page));
> +
> +        break;
> +    }
> +
> +    spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
> +
> +    return mfn;
> +}
> +
>  int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
>                                       uint32_t type, uint64_t start,
>                                       uint64_t end)
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index 4e86f0a2ab..3e845af0e4 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -122,6 +122,7 @@
>  #include <asm/fixmap.h>
>  #include <asm/io_apic.h>
>  #include <asm/pci.h>
> +#include <asm/hvm/ioreq.h>
>  
>  /* Mapping of the fixmap space needed early. */
>  l1_pgentry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
> @@ -4744,6 +4745,27 @@ static int xenmem_acquire_grant_table(struct domain *d,
>      return 0;
>  }
>  
> +static int xenmem_acquire_ioreq_server(struct domain *d,
> +                                       unsigned int id,
> +                                       unsigned long frame,
> +                                       unsigned long nr_frames,
> +                                       unsigned long mfn_list[])

mfn_t maybe?

> +{
> +    unsigned int i;

Either nr_frames wants to be unsigned int, or this needs to be
unsigned long.

> +
> +    for ( i = 0; i < nr_frames; i++ )
> +    {
> +        mfn_t mfn = hvm_get_ioreq_server_frame(d, id, frame + i);

Here you use unsigned long as the last parameter to
hvm_get_ioreq_server_frame, while the function takes an unsigned int.

Thanks, Roger.
Paul Durrant Aug. 25, 2017, 9:46 a.m. UTC | #2
> -----Original Message-----
> From: Roger Pau Monne
> Sent: 25 August 2017 10:32
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: xen-devel@lists.xenproject.org; Stefano Stabellini
> <sstabellini@kernel.org>; Wei Liu <wei.liu2@citrix.com>; George Dunlap
> <George.Dunlap@citrix.com>; Andrew Cooper
> <Andrew.Cooper3@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Tim
> (Xen.org) <tim@xen.org>; Jan Beulich <jbeulich@suse.com>
> Subject: Re: [Xen-devel] [PATCH v2 REPOST 12/12] x86/hvm/ioreq: add a
> new mappable resource type...
> 
> On Tue, Aug 22, 2017 at 03:51:06PM +0100, Paul Durrant wrote:
> > ... XENMEM_resource_ioreq_server
> >
> > This patch adds support for a new resource type that can be mapped using
> > the XENMEM_acquire_resource memory op.
> >
> > If an emulator makes use of this resource type then, instead of mapping
> > gfns, the IOREQ server will allocate pages from the heap. These pages
> > will never be present in the P2M of the guest at any point and so are
> > not vulnerable to any direct attack by the guest. They are only ever
> > accessible by Xen and any domain that has mapping privilege over the
> > guest (which may or may not be limited to the domain running the
> emulator).
> >
> > NOTE: Use of the new resource type is not compatible with use of
> >       XEN_DMOP_get_ioreq_server_info unless the XEN_DMOP_no_gfns
> flag is
> >       set.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > ---
> > Cc: Jan Beulich <jbeulich@suse.com>
> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> > Cc: George Dunlap <George.Dunlap@eu.citrix.com>
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Cc: Stefano Stabellini <sstabellini@kernel.org>
> > Cc: Tim Deegan <tim@xen.org>
> > Cc: Wei Liu <wei.liu2@citrix.com>
> > ---
> >  xen/arch/x86/hvm/ioreq.c        | 136
> ++++++++++++++++++++++++++++++++++++++++
> >  xen/arch/x86/mm.c               |  27 ++++++++
> >  xen/include/asm-x86/hvm/ioreq.h |   2 +
> >  xen/include/public/hvm/dm_op.h  |   4 ++
> >  xen/include/public/memory.h     |   3 +
> >  5 files changed, 172 insertions(+)
> >
> > diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
> > index 795c198f95..9e6838dab6 100644
> > --- a/xen/arch/x86/hvm/ioreq.c
> > +++ b/xen/arch/x86/hvm/ioreq.c
> > @@ -231,6 +231,15 @@ static int hvm_map_ioreq_gfn(struct
> hvm_ioreq_server *s, bool buf)
> >      struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
> >      int rc;
> >
> > +    if ( iorp->page )
> > +    {
> > +        /* Make sure the page has not been allocated */
> > +        if ( gfn_eq(iorp->gfn, INVALID_GFN) )
> > +            return -EPERM;
> > +
> > +        return 0;
> 
> EEXIST? (See comment below, which I think also applies here).
> 
> > +    }
> > +
> >      if ( d->is_dying )
> >          return -EINVAL;
> >
> > @@ -253,6 +262,60 @@ static int hvm_map_ioreq_gfn(struct
> hvm_ioreq_server *s, bool buf)
> >      return rc;
> >  }
> >
> > +static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
> > +{
> > +    struct domain *currd = current->domain;
> > +    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
> > +
> > +    if ( iorp->page )
> > +    {
> > +        /* Make sure the page has not been mapped */
> > +        if ( !gfn_eq(iorp->gfn, INVALID_GFN) )
> > +            return -EPERM;
> > +
> > +        return 0;
> 
> Shouldn't this return EEXIST? Page has already been allocated by a
> previous call AFAICT, and it seems like a possible error/misbehavior
> to try to do it twice.
> 

The checks are there to prevent a caller from trying to mix the legacy and new methods of mapping ioreq server pages so EPERM (i.e. 'operation not permitted') seems like the correct error. I agree that it's not obvious, at this inner level, that I do think this is right. I'm open to debate about this though.

> > +    }
> > +
> > +    /*
> > +     * Allocated IOREQ server pages are assigned to the emulating
> > +     * domain, not the target domain. This is because the emulator is
> > +     * likely to be destroyed after the target domain has been torn
> > +     * down, and we must use MEMF_no_refcount otherwise page
> allocation
> > +     * could fail if the emulating domain has already reached its
> > +     * maximum allocation.
> > +     */
> > +    iorp->page = alloc_domheap_page(currd, MEMF_no_refcount);
> 
> I don't really like the fact that the page is not accounted for any
> domain, but I can see the point in doing it like that (which you
> argument in the comment).
> 
> IIRC there where talks about tightening the accounting of memory
> pages, so that ideally everything would be accounted for in the memory
> assigned to the domain.
> 
> Just some random through, but could the toolstack set aside some
> memory pages (ie: not map them into the domain p2m), that could then
> be used by this? (not asking you to do this here)
> 
> And how many pages are we expecting to use for each domain? I assume
> the number will be quite low.
> 

Yes, I agree the use on MEMF_no_refcount is not ideal and you do highlight an issue: I don't think there is currently an upper limit on the number of ioreq servers so an emulating domain could exhaust memory using the new scheme. I'll need to introduce a limit to avoid that.

> > +    if ( !iorp->page )
> > +        return -ENOMEM;
> > +
> > +    get_page(iorp->page, currd);
> 
> Do you really need this get_page? (page is already assigned to currd).
> 

Hmm. I thought MEMF_no_refcount modified that behaviour, but I may have got confused with MEMF_no_owner.

> > +
> > +    iorp->va = __map_domain_page_global(iorp->page);
> > +    if ( !iorp->va )
> > +    {
> > +        put_page(iorp->page);
> > +        iorp->page = NULL;
> > +        return -ENOMEM;
> > +    }
> > +
> > +    clear_page(iorp->va);
> > +    return 0;
> > +}
> > +
> > +static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
> > +{
> > +    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
> > +
> > +    if ( !iorp->page )
> > +        return;
> > +
> > +    unmap_domain_page_global(iorp->va);
> > +    iorp->va = NULL;
> > +
> > +    put_page(iorp->page);
> > +    iorp->page = NULL;
> > +}
> > +
> >  bool is_ioreq_server_page(struct domain *d, const struct page_info
> *page)
> >  {
> >      const struct hvm_ioreq_server *s;
> > @@ -457,6 +520,27 @@ static void
> hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
> >      hvm_unmap_ioreq_gfn(s, false);
> >  }
> >
> > +static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s)
> > +{
> > +    int rc = -ENOMEM;
> > +
> > +    rc = hvm_alloc_ioreq_mfn(s, false);
> > +
> > +    if ( !rc && (s->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) )
> > +        rc = hvm_alloc_ioreq_mfn(s, true);
> > +
> > +    if ( rc )
> > +        hvm_free_ioreq_mfn(s, false);
> > +
> > +    return rc;
> > +}
> > +
> > +static void hvm_ioreq_server_free_pages(struct hvm_ioreq_server *s)
> > +{
> > +    hvm_free_ioreq_mfn(s, true);
> > +    hvm_free_ioreq_mfn(s, false);
> > +}
> > +
> >  static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server
> *s)
> >  {
> >      unsigned int i;
> > @@ -583,7 +667,18 @@ static int hvm_ioreq_server_init(struct
> hvm_ioreq_server *s,
> >
> >   fail_add:
> >      hvm_ioreq_server_remove_all_vcpus(s);
> > +
> > +    /*
> > +     * NOTE: It is safe to call both hvm_ioreq_server_unmap_pages() and
> > +     *       hvm_ioreq_server_free_pages() in that order.
> > +     *       This is because the former will do nothing if the pages
> > +     *       are not mapped, leaving the page to be freed by the latter.
> > +     *       However if the pages are mapped then the former will set
> > +     *       the page_info pointer to NULL, meaning the latter will do
> > +     *       nothing.
> > +     */
> >      hvm_ioreq_server_unmap_pages(s);
> > +    hvm_ioreq_server_free_pages(s);
> >
> >      return rc;
> >  }
> > @@ -593,6 +688,7 @@ static void hvm_ioreq_server_deinit(struct
> hvm_ioreq_server *s)
> >      ASSERT(!s->enabled);
> >      hvm_ioreq_server_remove_all_vcpus(s);
> >      hvm_ioreq_server_unmap_pages(s);
> > +    hvm_ioreq_server_free_pages(s);
> >      hvm_ioreq_server_free_rangesets(s);
> >  }
> >
> > @@ -745,6 +841,9 @@ int hvm_get_ioreq_server_info(struct domain *d,
> ioservid_t id,
> >              rc = hvm_ioreq_server_map_pages(s);
> >              if ( rc )
> >                  break;
> > +
> > +            gdprintk(XENLOG_INFO, "d%d ioreq server %u using gfns\n",
> > +                     d->domain_id, s->id);
> >          }
> >
> >          if ( ioreq_gfn )
> > @@ -767,6 +866,43 @@ int hvm_get_ioreq_server_info(struct domain *d,
> ioservid_t id,
> >      return rc;
> >  }
> >
> > +mfn_t hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
> > +                                 unsigned int idx)
> > +{
> > +    struct hvm_ioreq_server *s;
> > +    mfn_t mfn = INVALID_MFN;
> > +
> > +    spin_lock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
> > +
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server.list,
> > +                          list_entry )
> > +    {
> > +        int rc;
> > +
> > +        if ( s == d->arch.hvm_domain.default_ioreq_server )
> > +            continue;
> 
> s->is_default
> 
> > +
> > +        if ( s->id != id )
> > +            continue;
> > +
> > +        rc = hvm_ioreq_server_alloc_pages(s);
> > +        if ( rc )
> > +            break;
> > +
> > +        if ( idx == 0 )
> > +            mfn = _mfn(page_to_mfn(s->bufioreq.page));
> > +        else if ( idx == 1 )
> > +            mfn = _mfn(page_to_mfn(s->ioreq.page));
> > +
> > +        break;
> > +    }
> > +
> > +    spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
> > +
> > +    return mfn;
> > +}
> > +
> >  int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
> >                                       uint32_t type, uint64_t start,
> >                                       uint64_t end)
> > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> > index 4e86f0a2ab..3e845af0e4 100644
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -122,6 +122,7 @@
> >  #include <asm/fixmap.h>
> >  #include <asm/io_apic.h>
> >  #include <asm/pci.h>
> > +#include <asm/hvm/ioreq.h>
> >
> >  /* Mapping of the fixmap space needed early. */
> >  l1_pgentry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
> > @@ -4744,6 +4745,27 @@ static int xenmem_acquire_grant_table(struct
> domain *d,
> >      return 0;
> >  }
> >
> > +static int xenmem_acquire_ioreq_server(struct domain *d,
> > +                                       unsigned int id,
> > +                                       unsigned long frame,
> > +                                       unsigned long nr_frames,
> > +                                       unsigned long mfn_list[])
> 
> mfn_t maybe?

No. The list is mapped in so use of typesafe variants is not appropriate.

> 
> > +{
> > +    unsigned int i;
> 
> Either nr_frames wants to be unsigned int, or this needs to be
> unsigned long.

Not sure which. I'll check.

> 
> > +
> > +    for ( i = 0; i < nr_frames; i++ )
> > +    {
> > +        mfn_t mfn = hvm_get_ioreq_server_frame(d, id, frame + i);
> 
> Here you use unsigned long as the last parameter to
> hvm_get_ioreq_server_frame, while the function takes an unsigned int.

Ok.

Thanks,

  Paul

> 
> Thanks, Roger.
Roger Pau Monné Aug. 25, 2017, 9:53 a.m. UTC | #3
On Fri, Aug 25, 2017 at 10:46:48AM +0100, Paul Durrant wrote:
> > -----Original Message-----
> > From: Roger Pau Monne
> > > +static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
> > > +{
> > > +    struct domain *currd = current->domain;
> > > +    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
> > > +
> > > +    if ( iorp->page )
> > > +    {
> > > +        /* Make sure the page has not been mapped */
> > > +        if ( !gfn_eq(iorp->gfn, INVALID_GFN) )
> > > +            return -EPERM;
> > > +
> > > +        return 0;
> > 
> > Shouldn't this return EEXIST? Page has already been allocated by a
> > previous call AFAICT, and it seems like a possible error/misbehavior
> > to try to do it twice.
> > 
> 
> The checks are there to prevent a caller from trying to mix the legacy and new methods of mapping ioreq server pages so EPERM (i.e. 'operation not permitted') seems like the correct error. I agree that it's not obvious, at this inner level, that I do think this is right. I'm open to debate about this though.

Oh, I was referring to the 'return 0;', not the 'return -EPERM;' (that
looks fine to me).

The 'return 0;' means that the page is already setup (if I understood
this correctly), that's why I was wondering whether it should return
-EEXIST instead.

Roger.
Paul Durrant Aug. 25, 2017, 9:58 a.m. UTC | #4
> -----Original Message-----
> From: Roger Pau Monne
> Sent: 25 August 2017 10:53
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: xen-devel@lists.xenproject.org; Stefano Stabellini
> <sstabellini@kernel.org>; Wei Liu <wei.liu2@citrix.com>; George Dunlap
> <George.Dunlap@citrix.com>; Andrew Cooper
> <Andrew.Cooper3@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Tim
> (Xen.org) <tim@xen.org>; Jan Beulich <jbeulich@suse.com>
> Subject: Re: [Xen-devel] [PATCH v2 REPOST 12/12] x86/hvm/ioreq: add a
> new mappable resource type...
> 
> On Fri, Aug 25, 2017 at 10:46:48AM +0100, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Roger Pau Monne
> > > > +static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
> > > > +{
> > > > +    struct domain *currd = current->domain;
> > > > +    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
> > > > +
> > > > +    if ( iorp->page )
> > > > +    {
> > > > +        /* Make sure the page has not been mapped */
> > > > +        if ( !gfn_eq(iorp->gfn, INVALID_GFN) )
> > > > +            return -EPERM;
> > > > +
> > > > +        return 0;
> > >
> > > Shouldn't this return EEXIST? Page has already been allocated by a
> > > previous call AFAICT, and it seems like a possible error/misbehavior
> > > to try to do it twice.
> > >
> >
> > The checks are there to prevent a caller from trying to mix the legacy and
> new methods of mapping ioreq server pages so EPERM (i.e. 'operation not
> permitted') seems like the correct error. I agree that it's not obvious, at this
> inner level, that I do think this is right. I'm open to debate about this though.
> 
> Oh, I was referring to the 'return 0;', not the 'return -EPERM;' (that
> looks fine to me).
> 
> The 'return 0;' means that the page is already setup (if I understood
> this correctly), that's why I was wondering whether it should return
> -EEXIST instead.
> 

No, it has to be this way because of the 'on demand' nature of the allocation. If an emulator uses the new resource mapping scheme then the pages will be allocated on the first mapping attempt, but if the emulator unmaps and then re-maps that should succeed whereas if I return -EEXIST here then that would propagate back up through the second mapping hypercall.

  Paul

> Roger.
George Dunlap Aug. 29, 2017, 11:36 a.m. UTC | #5
On 08/25/2017 10:46 AM, Paul Durrant wrote:
>> -----Original Message-----
>> From: Roger Pau Monne
>> Sent: 25 August 2017 10:32
>> To: Paul Durrant <Paul.Durrant@citrix.com>
>> Cc: xen-devel@lists.xenproject.org; Stefano Stabellini
>> <sstabellini@kernel.org>; Wei Liu <wei.liu2@citrix.com>; George Dunlap
>> <George.Dunlap@citrix.com>; Andrew Cooper
>> <Andrew.Cooper3@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Tim
>> (Xen.org) <tim@xen.org>; Jan Beulich <jbeulich@suse.com>
>> Subject: Re: [Xen-devel] [PATCH v2 REPOST 12/12] x86/hvm/ioreq: add a
>> new mappable resource type...
>>
>> On Tue, Aug 22, 2017 at 03:51:06PM +0100, Paul Durrant wrote:
>>> ... XENMEM_resource_ioreq_server
>>>
>>> This patch adds support for a new resource type that can be mapped using
>>> the XENMEM_acquire_resource memory op.
>>>
>>> If an emulator makes use of this resource type then, instead of mapping
>>> gfns, the IOREQ server will allocate pages from the heap. These pages
>>> will never be present in the P2M of the guest at any point and so are
>>> not vulnerable to any direct attack by the guest. They are only ever
>>> accessible by Xen and any domain that has mapping privilege over the
>>> guest (which may or may not be limited to the domain running the
>> emulator).
>>>
>>> NOTE: Use of the new resource type is not compatible with use of
>>>       XEN_DMOP_get_ioreq_server_info unless the XEN_DMOP_no_gfns
>> flag is
>>>       set.
>>>
>>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>>> ---
>>> Cc: Jan Beulich <jbeulich@suse.com>
>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Cc: George Dunlap <George.Dunlap@eu.citrix.com>
>>> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>>> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>>> Cc: Stefano Stabellini <sstabellini@kernel.org>
>>> Cc: Tim Deegan <tim@xen.org>
>>> Cc: Wei Liu <wei.liu2@citrix.com>
>>> ---
>>>  xen/arch/x86/hvm/ioreq.c        | 136
>> ++++++++++++++++++++++++++++++++++++++++
>>>  xen/arch/x86/mm.c               |  27 ++++++++
>>>  xen/include/asm-x86/hvm/ioreq.h |   2 +
>>>  xen/include/public/hvm/dm_op.h  |   4 ++
>>>  xen/include/public/memory.h     |   3 +
>>>  5 files changed, 172 insertions(+)
>>>
>>> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
>>> index 795c198f95..9e6838dab6 100644
>>> --- a/xen/arch/x86/hvm/ioreq.c
>>> +++ b/xen/arch/x86/hvm/ioreq.c
>>> @@ -231,6 +231,15 @@ static int hvm_map_ioreq_gfn(struct
>> hvm_ioreq_server *s, bool buf)
>>>      struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
>>>      int rc;
>>>
>>> +    if ( iorp->page )
>>> +    {
>>> +        /* Make sure the page has not been allocated */
>>> +        if ( gfn_eq(iorp->gfn, INVALID_GFN) )
>>> +            return -EPERM;
>>> +
>>> +        return 0;
>>
>> EEXIST? (See comment below, which I think also applies here).
>>
>>> +    }
>>> +
>>>      if ( d->is_dying )
>>>          return -EINVAL;
>>>
>>> @@ -253,6 +262,60 @@ static int hvm_map_ioreq_gfn(struct
>> hvm_ioreq_server *s, bool buf)
>>>      return rc;
>>>  }
>>>
>>> +static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
>>> +{
>>> +    struct domain *currd = current->domain;
>>> +    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
>>> +
>>> +    if ( iorp->page )
>>> +    {
>>> +        /* Make sure the page has not been mapped */
>>> +        if ( !gfn_eq(iorp->gfn, INVALID_GFN) )
>>> +            return -EPERM;
>>> +
>>> +        return 0;
>>
>> Shouldn't this return EEXIST? Page has already been allocated by a
>> previous call AFAICT, and it seems like a possible error/misbehavior
>> to try to do it twice.
>>
> 
> The checks are there to prevent a caller from trying to mix the legacy and new methods of mapping ioreq server pages so EPERM (i.e. 'operation not permitted') seems like the correct error. I agree that it's not obvious, at this inner level, that I do think this is right. I'm open to debate about this though.

-EBUSY then?

 -George
George Dunlap Aug. 29, 2017, 1:40 p.m. UTC | #6
On Fri, Aug 25, 2017 at 10:46 AM, Paul Durrant <Paul.Durrant@citrix.com> wrote:
>> > +    /*
>> > +     * Allocated IOREQ server pages are assigned to the emulating
>> > +     * domain, not the target domain. This is because the emulator is
>> > +     * likely to be destroyed after the target domain has been torn
>> > +     * down, and we must use MEMF_no_refcount otherwise page
>> allocation
>> > +     * could fail if the emulating domain has already reached its
>> > +     * maximum allocation.
>> > +     */
>> > +    iorp->page = alloc_domheap_page(currd, MEMF_no_refcount);
>>
>> I don't really like the fact that the page is not accounted for any
>> domain, but I can see the point in doing it like that (which you
>> argument in the comment).
>>
>> IIRC there where talks about tightening the accounting of memory
>> pages, so that ideally everything would be accounted for in the memory
>> assigned to the domain.
>>
>> Just some random through, but could the toolstack set aside some
>> memory pages (ie: not map them into the domain p2m), that could then
>> be used by this? (not asking you to do this here)
>>
>> And how many pages are we expecting to use for each domain? I assume
>> the number will be quite low.
>>
>
> Yes, I agree the use on MEMF_no_refcount is not ideal and you do highlight an issue: I don't think there is currently an upper limit on the number of ioreq servers so an emulating domain could exhaust memory using the new scheme. I'll need to introduce a limit to avoid that.

I'm not terribly happy with allocating out-of-band pages either.  One
of the advantages of the way things are done now (with the page
allocated to the guest VM) is that it is more resilient to unexpected
events:  If the domain dies before the emulator is done, you have a
"zombie" domain until the process exits.  But once the process exits
for any reason -- whether crashing or whatever -- the ref is freed and
the domain can finish dying.

What happens in this case if the dm process in dom0 is killed /
segfaults before it can unmap the page?  Will the page be properly
freed, or will it just leak?

I don't immediately see an advantage to doing what you're doing here,
instaed of just calling hvm_alloc_ioreq_gfn().  The only reason you
give is that the domain is usually destroyed before the emulator
(meaning a short period of time where you have a 'zombie' domain), but
I don't see why that's an issue -- it doesn't seem like that's worth
the can of worms that it opens up.

 -George
Paul Durrant Aug. 29, 2017, 2:10 p.m. UTC | #7
> -----Original Message-----

> From: George Dunlap

> Sent: 29 August 2017 14:40

> To: Paul Durrant <Paul.Durrant@citrix.com>

> Cc: Roger Pau Monne <roger.pau@citrix.com>; Stefano Stabellini

> <sstabellini@kernel.org>; Wei Liu <wei.liu2@citrix.com>; Andrew Cooper

> <Andrew.Cooper3@citrix.com>; Tim (Xen.org) <tim@xen.org>; Jan Beulich

> <jbeulich@suse.com>; Ian Jackson <Ian.Jackson@citrix.com>; xen-

> devel@lists.xenproject.org

> Subject: Re: [Xen-devel] [PATCH v2 REPOST 12/12] x86/hvm/ioreq: add a

> new mappable resource type...

> 

> On Fri, Aug 25, 2017 at 10:46 AM, Paul Durrant <Paul.Durrant@citrix.com>

> wrote:

> >> > +    /*

> >> > +     * Allocated IOREQ server pages are assigned to the emulating

> >> > +     * domain, not the target domain. This is because the emulator is

> >> > +     * likely to be destroyed after the target domain has been torn

> >> > +     * down, and we must use MEMF_no_refcount otherwise page

> >> allocation

> >> > +     * could fail if the emulating domain has already reached its

> >> > +     * maximum allocation.

> >> > +     */

> >> > +    iorp->page = alloc_domheap_page(currd, MEMF_no_refcount);

> >>

> >> I don't really like the fact that the page is not accounted for any

> >> domain, but I can see the point in doing it like that (which you

> >> argument in the comment).

> >>

> >> IIRC there where talks about tightening the accounting of memory

> >> pages, so that ideally everything would be accounted for in the memory

> >> assigned to the domain.

> >>

> >> Just some random through, but could the toolstack set aside some

> >> memory pages (ie: not map them into the domain p2m), that could then

> >> be used by this? (not asking you to do this here)

> >>

> >> And how many pages are we expecting to use for each domain? I assume

> >> the number will be quite low.

> >>

> >

> > Yes, I agree the use on MEMF_no_refcount is not ideal and you do

> highlight an issue: I don't think there is currently an upper limit on the

> number of ioreq servers so an emulating domain could exhaust memory

> using the new scheme. I'll need to introduce a limit to avoid that.

> 

> I'm not terribly happy with allocating out-of-band pages either.  One

> of the advantages of the way things are done now (with the page

> allocated to the guest VM) is that it is more resilient to unexpected

> events:  If the domain dies before the emulator is done, you have a

> "zombie" domain until the process exits.  But once the process exits

> for any reason -- whether crashing or whatever -- the ref is freed and

> the domain can finish dying.

> 

> What happens in this case if the dm process in dom0 is killed /

> segfaults before it can unmap the page?  Will the page be properly

> freed, or will it just leak?


The page is referenced by the ioreq server in the target domain, so it will be freed when the target domain is destroyed.

> 

> I don't immediately see an advantage to doing what you're doing here,

> instaed of just calling hvm_alloc_ioreq_gfn().  The only reason you

> give is that the domain is usually destroyed before the emulator

> (meaning a short period of time where you have a 'zombie' domain), but

> I don't see why that's an issue -- it doesn't seem like that's worth

> the can of worms that it opens up.

> 


The advantage is that the page is *never* in the guest P2M so it cannot be mapped by the guest. The use of guest pages for communication between Xen and an emulator is a well-known attack surface and IIRC has already been the subject of at least one XSA. Until we have better infrastructure to account hypervisor memory to guests then I think using alloc_domheap_page() with MEMF_no_refcount is the best way.

  Paul

>  -George
George Dunlap Aug. 29, 2017, 2:26 p.m. UTC | #8
On Tue, Aug 29, 2017 at 3:10 PM, Paul Durrant <Paul.Durrant@citrix.com> wrote:
>> -----Original Message-----
>> From: George Dunlap
>> Sent: 29 August 2017 14:40
>> To: Paul Durrant <Paul.Durrant@citrix.com>
>> Cc: Roger Pau Monne <roger.pau@citrix.com>; Stefano Stabellini
>> <sstabellini@kernel.org>; Wei Liu <wei.liu2@citrix.com>; Andrew Cooper
>> <Andrew.Cooper3@citrix.com>; Tim (Xen.org) <tim@xen.org>; Jan Beulich
>> <jbeulich@suse.com>; Ian Jackson <Ian.Jackson@citrix.com>; xen-
>> devel@lists.xenproject.org
>> Subject: Re: [Xen-devel] [PATCH v2 REPOST 12/12] x86/hvm/ioreq: add a
>> new mappable resource type...
>>
>> On Fri, Aug 25, 2017 at 10:46 AM, Paul Durrant <Paul.Durrant@citrix.com>
>> wrote:

[snip]

>> I don't immediately see an advantage to doing what you're doing here,
>> instaed of just calling hvm_alloc_ioreq_gfn().  The only reason you
>> give is that the domain is usually destroyed before the emulator
>> (meaning a short period of time where you have a 'zombie' domain), but
>> I don't see why that's an issue -- it doesn't seem like that's worth
>> the can of worms that it opens up.
>>
>
> The advantage is that the page is *never* in the guest P2M so it cannot be mapped by the guest. The use of guest pages for communication between Xen and an emulator is a well-known attack surface and IIRC has already been the subject of at least one XSA. Until we have better infrastructure to account hypervisor memory to guests then I think using alloc_domheap_page() with MEMF_no_refcount is the best way.

...and hvm_alloc_ioreq_gfn() grabs pages which are in the guest's p2m
space but set aside for ioreq servers, so using those would make the
entire series pointless.  Sorry for missing that.

[snip]

>> >> > +    /*
>> >> > +     * Allocated IOREQ server pages are assigned to the emulating
>> >> > +     * domain, not the target domain. This is because the emulator is
>> >> > +     * likely to be destroyed after the target domain has been torn
>> >> > +     * down, and we must use MEMF_no_refcount otherwise page
>> >> allocation
>> >> > +     * could fail if the emulating domain has already reached its
>> >> > +     * maximum allocation.
>> >> > +     */
>> >> > +    iorp->page = alloc_domheap_page(currd, MEMF_no_refcount);
>> >>
>> >> I don't really like the fact that the page is not accounted for any
>> >> domain, but I can see the point in doing it like that (which you
>> >> argument in the comment).
>> >>
>> >> IIRC there where talks about tightening the accounting of memory
>> >> pages, so that ideally everything would be accounted for in the memory
>> >> assigned to the domain.
>> >>
>> >> Just some random through, but could the toolstack set aside some
>> >> memory pages (ie: not map them into the domain p2m), that could then
>> >> be used by this? (not asking you to do this here)
>> >>
>> >> And how many pages are we expecting to use for each domain? I assume
>> >> the number will be quite low.
>> >>
>> >
>> > Yes, I agree the use on MEMF_no_refcount is not ideal and you do
>> highlight an issue: I don't think there is currently an upper limit on the
>> number of ioreq servers so an emulating domain could exhaust memory
>> using the new scheme. I'll need to introduce a limit to avoid that.
>>
>> I'm not terribly happy with allocating out-of-band pages either.  One
>> of the advantages of the way things are done now (with the page
>> allocated to the guest VM) is that it is more resilient to unexpected
>> events:  If the domain dies before the emulator is done, you have a
>> "zombie" domain until the process exits.  But once the process exits
>> for any reason -- whether crashing or whatever -- the ref is freed and
>> the domain can finish dying.
>>
>> What happens in this case if the dm process in dom0 is killed /
>> segfaults before it can unmap the page?  Will the page be properly
>> freed, or will it just leak?
>
> The page is referenced by the ioreq server in the target domain, so it will be freed when the target domain is destroyed.

I don't understand how you're using the terms... I would have
interpreted 'target domain' to me means the guest VM to which emulated
devices are being provided, and 'ioreq server' means the process
(perhaps in dom0, perhaps in a stubdomain) which is providing the
emulated devices.

Did you mean that it's referenced by the ioreq_server struct in the
target domain, and so a put_page() will happen when the guest is
destroyed?

 -George
Paul Durrant Aug. 29, 2017, 2:31 p.m. UTC | #9
> -----Original Message-----

[snip]
> >> I'm not terribly happy with allocating out-of-band pages either.  One

> >> of the advantages of the way things are done now (with the page

> >> allocated to the guest VM) is that it is more resilient to unexpected

> >> events:  If the domain dies before the emulator is done, you have a

> >> "zombie" domain until the process exits.  But once the process exits

> >> for any reason -- whether crashing or whatever -- the ref is freed and

> >> the domain can finish dying.

> >>

> >> What happens in this case if the dm process in dom0 is killed /

> >> segfaults before it can unmap the page?  Will the page be properly

> >> freed, or will it just leak?

> >

> > The page is referenced by the ioreq server in the target domain, so it will

> be freed when the target domain is destroyed.

> 

> I don't understand how you're using the terms... I would have

> interpreted 'target domain' to me means the guest VM to which emulated

> devices are being provided, and 'ioreq server' means the process

> (perhaps in dom0, perhaps in a stubdomain) which is providing the

> emulated devices.

> 

> Did you mean that it's referenced by the ioreq_server struct in the

> target domain, and so a put_page() will happen when the guest is

> destroyed?


Terminology issues :-) By 'ioreq server' I mean the infrastructure in Xen, centred around struct ioreq_server. I refer to the dom0 process/stub domain/xengt module/whatever as the 'emulator'.
So, yes, the fact that the page is referenced in the ioreq server of the target domain means that a put_page() will happen when that domain is destroyed.

  Paul

> 

>  -George
George Dunlap Aug. 29, 2017, 2:38 p.m. UTC | #10
On Tue, Aug 29, 2017 at 3:31 PM, Paul Durrant <Paul.Durrant@citrix.com> wrote:
>> -----Original Message-----
> [snip]
>> >> I'm not terribly happy with allocating out-of-band pages either.  One
>> >> of the advantages of the way things are done now (with the page
>> >> allocated to the guest VM) is that it is more resilient to unexpected
>> >> events:  If the domain dies before the emulator is done, you have a
>> >> "zombie" domain until the process exits.  But once the process exits
>> >> for any reason -- whether crashing or whatever -- the ref is freed and
>> >> the domain can finish dying.
>> >>
>> >> What happens in this case if the dm process in dom0 is killed /
>> >> segfaults before it can unmap the page?  Will the page be properly
>> >> freed, or will it just leak?
>> >
>> > The page is referenced by the ioreq server in the target domain, so it will
>> be freed when the target domain is destroyed.
>>
>> I don't understand how you're using the terms... I would have
>> interpreted 'target domain' to me means the guest VM to which emulated
>> devices are being provided, and 'ioreq server' means the process
>> (perhaps in dom0, perhaps in a stubdomain) which is providing the
>> emulated devices.
>>
>> Did you mean that it's referenced by the ioreq_server struct in the
>> target domain, and so a put_page() will happen when the guest is
>> destroyed?
>
> Terminology issues :-) By 'ioreq server' I mean the infrastructure in Xen, centred around struct ioreq_server. I refer to the dom0 process/stub domain/xengt module/whatever as the 'emulator'.
> So, yes, the fact that the page is referenced in the ioreq server of the target domain means that a put_page() will happen when that domain is destroyed.

OK; in that case:

Acked-by: George Dunlap <george.dunlap@citrix.com>

I think that's the only other Ack you need from me.  Thanks for doing this work.

 -George
Paul Durrant Aug. 29, 2017, 2:49 p.m. UTC | #11
> -----Original Message-----

> From: George Dunlap

> Sent: 29 August 2017 15:38

> To: Paul Durrant <Paul.Durrant@citrix.com>

> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu

> <wei.liu2@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>; Tim

> (Xen.org) <tim@xen.org>; Jan Beulich <jbeulich@suse.com>; Ian Jackson

> <Ian.Jackson@citrix.com>; xen-devel@lists.xenproject.org; Roger Pau

> Monne <roger.pau@citrix.com>

> Subject: Re: [Xen-devel] [PATCH v2 REPOST 12/12] x86/hvm/ioreq: add a

> new mappable resource type...

> 

> On Tue, Aug 29, 2017 at 3:31 PM, Paul Durrant <Paul.Durrant@citrix.com>

> wrote:

> >> -----Original Message-----

> > [snip]

> >> >> I'm not terribly happy with allocating out-of-band pages either.  One

> >> >> of the advantages of the way things are done now (with the page

> >> >> allocated to the guest VM) is that it is more resilient to unexpected

> >> >> events:  If the domain dies before the emulator is done, you have a

> >> >> "zombie" domain until the process exits.  But once the process exits

> >> >> for any reason -- whether crashing or whatever -- the ref is freed and

> >> >> the domain can finish dying.

> >> >>

> >> >> What happens in this case if the dm process in dom0 is killed /

> >> >> segfaults before it can unmap the page?  Will the page be properly

> >> >> freed, or will it just leak?

> >> >

> >> > The page is referenced by the ioreq server in the target domain, so it

> will

> >> be freed when the target domain is destroyed.

> >>

> >> I don't understand how you're using the terms... I would have

> >> interpreted 'target domain' to me means the guest VM to which

> emulated

> >> devices are being provided, and 'ioreq server' means the process

> >> (perhaps in dom0, perhaps in a stubdomain) which is providing the

> >> emulated devices.

> >>

> >> Did you mean that it's referenced by the ioreq_server struct in the

> >> target domain, and so a put_page() will happen when the guest is

> >> destroyed?

> >

> > Terminology issues :-) By 'ioreq server' I mean the infrastructure in Xen,

> centred around struct ioreq_server. I refer to the dom0 process/stub

> domain/xengt module/whatever as the 'emulator'.

> > So, yes, the fact that the page is referenced in the ioreq server of the

> target domain means that a put_page() will happen when that domain is

> destroyed.

> 

> OK; in that case:

> 

> Acked-by: George Dunlap <george.dunlap@citrix.com>

> 


Cool, thanks.

> I think that's the only other Ack you need from me.  Thanks for doing this

> work.

> 


No probs. Cheers,

  Paul

>  -George
diff mbox

Patch

diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index 795c198f95..9e6838dab6 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -231,6 +231,15 @@  static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
     struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
     int rc;
 
+    if ( iorp->page )
+    {
+        /* Make sure the page has not been allocated */
+        if ( gfn_eq(iorp->gfn, INVALID_GFN) )
+            return -EPERM;
+
+        return 0;
+    }
+
     if ( d->is_dying )
         return -EINVAL;
 
@@ -253,6 +262,60 @@  static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
     return rc;
 }
 
+static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
+{
+    struct domain *currd = current->domain;
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+
+    if ( iorp->page )
+    {
+        /* Make sure the page has not been mapped */
+        if ( !gfn_eq(iorp->gfn, INVALID_GFN) )
+            return -EPERM;
+
+        return 0;
+    }
+
+    /*
+     * Allocated IOREQ server pages are assigned to the emulating
+     * domain, not the target domain. This is because the emulator is
+     * likely to be destroyed after the target domain has been torn
+     * down, and we must use MEMF_no_refcount otherwise page allocation
+     * could fail if the emulating domain has already reached its
+     * maximum allocation.
+     */
+    iorp->page = alloc_domheap_page(currd, MEMF_no_refcount);
+    if ( !iorp->page )
+        return -ENOMEM;
+
+    get_page(iorp->page, currd);
+
+    iorp->va = __map_domain_page_global(iorp->page);
+    if ( !iorp->va )
+    {
+        put_page(iorp->page);
+        iorp->page = NULL;
+        return -ENOMEM;
+    }
+
+    clear_page(iorp->va);
+    return 0;
+}
+
+static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
+{
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+
+    if ( !iorp->page )
+        return;
+
+    unmap_domain_page_global(iorp->va);
+    iorp->va = NULL;
+
+    put_page(iorp->page);
+    iorp->page = NULL;
+}
+
 bool is_ioreq_server_page(struct domain *d, const struct page_info *page)
 {
     const struct hvm_ioreq_server *s;
@@ -457,6 +520,27 @@  static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
     hvm_unmap_ioreq_gfn(s, false);
 }
 
+static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s)
+{
+    int rc = -ENOMEM;
+
+    rc = hvm_alloc_ioreq_mfn(s, false);
+
+    if ( !rc && (s->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) )
+        rc = hvm_alloc_ioreq_mfn(s, true);
+
+    if ( rc )
+        hvm_free_ioreq_mfn(s, false);
+
+    return rc;
+}
+
+static void hvm_ioreq_server_free_pages(struct hvm_ioreq_server *s)
+{
+    hvm_free_ioreq_mfn(s, true);
+    hvm_free_ioreq_mfn(s, false);
+}
+
 static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s)
 {
     unsigned int i;
@@ -583,7 +667,18 @@  static int hvm_ioreq_server_init(struct hvm_ioreq_server *s,
 
  fail_add:
     hvm_ioreq_server_remove_all_vcpus(s);
+
+    /*
+     * NOTE: It is safe to call both hvm_ioreq_server_unmap_pages() and
+     *       hvm_ioreq_server_free_pages() in that order.
+     *       This is because the former will do nothing if the pages
+     *       are not mapped, leaving the page to be freed by the latter.
+     *       However if the pages are mapped then the former will set
+     *       the page_info pointer to NULL, meaning the latter will do
+     *       nothing.
+     */
     hvm_ioreq_server_unmap_pages(s);
+    hvm_ioreq_server_free_pages(s);
 
     return rc;
 }
@@ -593,6 +688,7 @@  static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s)
     ASSERT(!s->enabled);
     hvm_ioreq_server_remove_all_vcpus(s);
     hvm_ioreq_server_unmap_pages(s);
+    hvm_ioreq_server_free_pages(s);
     hvm_ioreq_server_free_rangesets(s);
 }
 
@@ -745,6 +841,9 @@  int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
             rc = hvm_ioreq_server_map_pages(s);
             if ( rc )
                 break;
+
+            gdprintk(XENLOG_INFO, "d%d ioreq server %u using gfns\n",
+                     d->domain_id, s->id);
         }
 
         if ( ioreq_gfn )
@@ -767,6 +866,43 @@  int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
     return rc;
 }
 
+mfn_t hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
+                                 unsigned int idx)
+{
+    struct hvm_ioreq_server *s;
+    mfn_t mfn = INVALID_MFN;
+
+    spin_lock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
+
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server.list,
+                          list_entry )
+    {
+        int rc;
+
+        if ( s == d->arch.hvm_domain.default_ioreq_server )
+            continue;
+
+        if ( s->id != id )
+            continue;
+
+        rc = hvm_ioreq_server_alloc_pages(s);
+        if ( rc )
+            break;
+
+        if ( idx == 0 )
+            mfn = _mfn(page_to_mfn(s->bufioreq.page));
+        else if ( idx == 1 )
+            mfn = _mfn(page_to_mfn(s->ioreq.page));
+
+        break;
+    }
+
+    spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
+
+    return mfn;
+}
+
 int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
                                      uint32_t type, uint64_t start,
                                      uint64_t end)
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 4e86f0a2ab..3e845af0e4 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -122,6 +122,7 @@ 
 #include <asm/fixmap.h>
 #include <asm/io_apic.h>
 #include <asm/pci.h>
+#include <asm/hvm/ioreq.h>
 
 /* Mapping of the fixmap space needed early. */
 l1_pgentry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
@@ -4744,6 +4745,27 @@  static int xenmem_acquire_grant_table(struct domain *d,
     return 0;
 }
 
+static int xenmem_acquire_ioreq_server(struct domain *d,
+                                       unsigned int id,
+                                       unsigned long frame,
+                                       unsigned long nr_frames,
+                                       unsigned long mfn_list[])
+{
+    unsigned int i;
+
+    for ( i = 0; i < nr_frames; i++ )
+    {
+        mfn_t mfn = hvm_get_ioreq_server_frame(d, id, frame + i);
+
+        if ( mfn_eq(mfn, INVALID_MFN) )
+            return -EINVAL;
+
+        mfn_list[i] = mfn_x(mfn);
+    }
+
+    return 0;
+}
+
 static int xenmem_acquire_resource(xen_mem_acquire_resource_t *xmar)
 {
     struct domain *d, *currd = current->domain;
@@ -4778,6 +4800,11 @@  static int xenmem_acquire_resource(xen_mem_acquire_resource_t *xmar)
                                         mfn_list);
         break;
 
+    case XENMEM_resource_ioreq_server:
+        rc = xenmem_acquire_ioreq_server(d, xmar->id, xmar->frame,
+                                         xmar->nr_frames, mfn_list);
+        break;
+
     default:
         rc = -EOPNOTSUPP;
         break;
diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h
index 1829fcf43e..032aeb6fa9 100644
--- a/xen/include/asm-x86/hvm/ioreq.h
+++ b/xen/include/asm-x86/hvm/ioreq.h
@@ -31,6 +31,8 @@  int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
                               unsigned long *ioreq_gfn,
                               unsigned long *bufioreq_gfn,
                               evtchn_port_t *bufioreq_port);
+mfn_t hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
+                                 unsigned int idx);
 int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
                                      uint32_t type, uint64_t start,
                                      uint64_t end);
diff --git a/xen/include/public/hvm/dm_op.h b/xen/include/public/hvm/dm_op.h
index 9677bd74e7..59b6006910 100644
--- a/xen/include/public/hvm/dm_op.h
+++ b/xen/include/public/hvm/dm_op.h
@@ -90,6 +90,10 @@  struct xen_dm_op_create_ioreq_server {
  * the frame numbers passed back in gfns <ioreq_gfn> and <bufioreq_gfn>
  * respectively. (If the IOREQ Server is not handling buffered emulation
  * only <ioreq_gfn> will be valid).
+ *
+ * NOTE: To access the synchronous ioreq structures and buffered ioreq
+ *       ring, it is preferable to use the XENMEM_acquire_resource memory
+ *       op specifying resource type XENMEM_resource_ioreq_server.
  */
 #define XEN_DMOP_get_ioreq_server_info 2
 
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 9bf58e7384..716941dc0c 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -664,10 +664,13 @@  struct xen_mem_acquire_resource {
     uint16_t type;
 
 #define XENMEM_resource_grant_table 0
+#define XENMEM_resource_ioreq_server 1
 
     /*
      * IN - a type-specific resource identifier, which must be zero
      *      unless stated otherwise.
+     *
+     * type == XENMEM_resource_ioreq_server -> id == ioreq server id
      */
     uint32_t id;
     /* IN - number of (4K) frames of the resource to be mapped */