diff mbox series

[v12,6/6] introduce a 'passthrough' configuration option to xl.cfg...

Message ID 20190916092708.2624-1-paul.durrant@citrix.com (mailing list archive)
State New, archived
Headers show
Series None | expand

Commit Message

Paul Durrant Sept. 16, 2019, 9:27 a.m. UTC
...and hence the ability to disable IOMMU mappings, and control EPT
sharing.

This patch introduces a new 'libxl_passthrough' enumeration into
libxl_domain_create_info. The value will be set by xl either when it parses
a new 'passthrough' option in xl.cfg, or implicitly if there is passthrough
hardware specified for the domain.

If the value of the passthrough configuration option is 'disabled' then
the XEN_DOMCTL_CDF_iommu flag will be clear in the xen_domctl_createdomain
flags, thus allowing the toolstack to control whether the domain gets
IOMMU mappings or not (where previously they were globally set).

If the value of the passthrough configuration option is 'sync_pt' then
a new 'iommu_opts' field in xen_domctl_createdomain will be set with the
value XEN_DOMCTL_IOMMU_no_sharept. This will override the global default
set in iommu_hap_pt_share, thus allowing the toolstack to control whether
EPT sharing is used for the domain.

If the value of passthrough is 'enabled' then xl will choose an appropriate
default according to the type of domain and hardware support.

NOTE: The 'iommu_memkb' overhead in libxl_domain_build_info will now only
      be set if passthrough is 'sync_pt' (or xl has chosen this mode as
      a default).

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wl@xen.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Anthony PERARD <anthony.perard@citrix.com>
Cc: David Scott <dave@recoil.org>
Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>

Sending just this patch since the rest of the series is unchanged from
v11(.1).

Previously part of series https://lists.xenproject.org/archives/html/xen-devel/2019-07/msg02267.html

v12:
 - Set passthrough default in libxl

v11:
 - Fixed abi-check runes

v10:
 - Added abi-check runes

v9:
 - Added the passthrough='enabled' option to xl
 - One cosmetic change in xen
 - Assume Jan's R-b stands since non-cosmetic changes are only in the
   toolstack

v7:
 - Added missing breaks
 - Added missing ocaml binding changes

v6:
 - Remove the libxl_physinfo() call since it's usefulness is limited to x86

v5:
 - Expand xen_domctl_createdomain flags field and hence bump interface
   version
 - Fix spelling mistakes in context line
---
 docs/man/xl.cfg.5.pod.in            |  56 ++++++++++
 tools/libxl/libxl.h                 |   9 ++
 tools/libxl/libxl_create.c          |  27 +++--
 tools/libxl/libxl_types.idl         |   7 ++
 tools/ocaml/libs/xc/xenctrl.ml      |   4 +
 tools/ocaml/libs/xc/xenctrl.mli     |   5 +
 tools/ocaml/libs/xc/xenctrl_stubs.c |  17 ++-
 tools/xl/xl_parse.c                 | 158 ++++++++++++++++++----------
 xen/arch/arm/domain.c               |  10 +-
 xen/arch/x86/domain.c               |   2 +-
 xen/common/domain.c                 |   7 ++
 xen/common/domctl.c                 |  13 ---
 xen/drivers/passthrough/iommu.c     |  13 ++-
 xen/include/public/domctl.h         |  10 +-
 xen/include/xen/iommu.h             |  15 +--
 15 files changed, 266 insertions(+), 87 deletions(-)

Comments

George Dunlap Sept. 16, 2019, 10:15 a.m. UTC | #1
On 9/16/19 10:27 AM, Paul Durrant wrote:
> ...and hence the ability to disable IOMMU mappings, and control EPT
> sharing.
> 
> This patch introduces a new 'libxl_passthrough' enumeration into
> libxl_domain_create_info. The value will be set by xl either when it parses
> a new 'passthrough' option in xl.cfg, or implicitly if there is passthrough
> hardware specified for the domain.
> 
> If the value of the passthrough configuration option is 'disabled' then
> the XEN_DOMCTL_CDF_iommu flag will be clear in the xen_domctl_createdomain
> flags, thus allowing the toolstack to control whether the domain gets
> IOMMU mappings or not (where previously they were globally set).
> 
> If the value of the passthrough configuration option is 'sync_pt' then
> a new 'iommu_opts' field in xen_domctl_createdomain will be set with the
> value XEN_DOMCTL_IOMMU_no_sharept. This will override the global default
> set in iommu_hap_pt_share, thus allowing the toolstack to control whether
> EPT sharing is used for the domain.
> 
> If the value of passthrough is 'enabled' then xl will choose an appropriate
> default according to the type of domain and hardware support.

Minor suggestion: I prefer using a word like "auto" when you're letting
the computer decide something.

I'd also...

> diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> index c99d40307e..154d847fb9 100644
> --- a/docs/man/xl.cfg.5.pod.in
> +++ b/docs/man/xl.cfg.5.pod.in
> @@ -605,6 +605,62 @@ option should only be used with a trusted device tree.
>  Note that the partial device tree should avoid using the phandle 65000
>  which is reserved by the toolstack.
>  
> +=item B<passthrough="STRING">
> +
> +Specify whether IOMMU mappings are enabled for the domain and hence whether
> +it will be enabled for passthrough hardware. Valid values for this option
> +are:
> +
> +=over 4
> +
> +=item B<disabled>
> +
> +IOMMU mappings are disabled for the domain and so hardware may not be
> +passed through.
> +
> +This option is the default if no passthrough hardware is specified in the
> +domain's configuration.
> +
> +=item B<sync_pt>
> +
> +This option means that IOMMU mappings will be synchronized with the
> +domain's P2M table as follows:
> +
> +For a PV domain, all writable pages assigned to the domain are identity
> +mapped by MFN in the IOMMU page table. Thus a device driver running in the
> +domain may program passthrough hardware for DMA using MFN values
> +(i.e. host/machine frame numbers) looked up in its P2M.
> +
> +For an HVM domain, all non-foreign RAM pages present in its P2M will be
> +mapped by GFN in the IOMMU page table. Thus a device driver running in the
> +domain may program passthrough hardware using GFN values (i.e. guest
> +physical frame numbers) without any further translation.
> +
> +This option is not currently available on Arm.
> +
> +=item B<share_pt>
> +
> +This option is unavailable for a PV domain. For an HVM domain, this option
> +means that the IOMMU will be programmed to directly reference the domain's
> +P2M table as its page table. From the point of view of a device driver
> +running in the domain this is functionally equivalent to B<sync_pt> but
> +places less load on the hypervisor and so should generally be selected in
> +preference. However, the availability of this option is hardware specific.
> +If B<xl info> reports B<virt_caps> containing B<iommu_hap_pt_share> then
> +this option may be used.
> +
> +=item B<enabled>
> +
> +This option enables IOMMU mappings and selects an appropriate default
> +operating mode. For HVM domains running on platforms where the option is
> +available, this is equivalent to B<share_pt>. Otherwise, and also for PV
> +domains, this options is equivalent to B<sync_pt>.

...put the option we want / expect people nearer the top (either first
or second).

Thanks,
 -George
Anthony PERARD Sept. 16, 2019, 2:05 p.m. UTC | #2
On Mon, Sep 16, 2019 at 10:27:08AM +0100, Paul Durrant wrote:
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index 59dbcb50a0..7afae81432 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -30,6 +30,12 @@
>  int libxl__domain_create_info_setdefault(libxl__gc *gc,
>                                           libxl_domain_create_info *c_info)
>  {
> +    libxl_physinfo info;
> +    int rc = libxl_get_physinfo(CTX, &info);
> +
> +    if (rc)
> +            return rc;
> +

I think this hunk would be more readable if it was written:
    int rc;

    rc = libxl_get_physinfo(CTX, &info);
    if (rc)
        return rc;

Otherwise, the check for error is alone which might mean "we are in a
callback function and check for rc passed by parameter" or that it's a
stray check.

> @@ -62,6 +62,12 @@ int libxl__domain_create_info_setdefault(libxl__gc *gc,
>      if (!c_info->ssidref)
>          c_info->ssidref = SECINITSID_DOMU;
>  
> +    if (info.cap_hvm_directio) {
> +        c_info->passthrough = ((c_info->type == LIBXL_DOMAIN_TYPE_PV) ||
> +                               !info.cap_iommu_hap_pt_share) ?
> +            LIBXL_PASSTHROUGH_SYNC_PT : LIBXL_PASSTHROUGH_SHARE_PT;
> +    }

I don't think you can do that. This will overwrite the value that was in
c_info before (that a user as set). The _setdefault() function is only
called after c_info have been filled by users of the libxl, it only has
to change the value if it was the default. c_info->passthrough has no
default value, so it's not possible to know what a user wants.

What about adding "default"==0 in libxl_passthrough enum?

>      return 0;
>  }
>  
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index d52c63b6b0..22f05711e3 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -263,6 +263,12 @@ libxl_vkb_backend = Enumeration("vkb_backend", [
>      (2, "LINUX")
>      ])
>  
> +libxl_passthrough = Enumeration("passthrough", [
> +    (0, "disabled"),
> +    (1, "sync_pt"),
> +    (2, "share_pt"),
> +    ])
> +
>  #
>  # Complex libxl types
>  #
> diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
> index 293f5f730e..4b2baa0403 100644
> --- a/tools/xl/xl_parse.c
> +++ b/tools/xl/xl_parse.c
> @@ -1461,6 +1461,107 @@ void parse_config_data(const char *config_source,
> +    if (xlu_cfg_get_string(config, "passthrough", &buf, 0)) {
> +        buf = (d_config->num_pcidevs || d_config->num_dtdevs)
> +            ? "enabled" : "disabled";
> +    }
> +
> +    if (!strncmp(buf, "enabled", strlen(buf))) {

Do you intend to have "passthrough=''" been the equivalent of
"passthrough='enabled'" in the config file ?
(same with "e", "en", "ena", ...)

> +        /* Choose a suitable default */
> +        c_info->passthrough =
> +            (c_info->type == LIBXL_DOMAIN_TYPE_PV) || !iommu_hap_pt_share
> +            ? LIBXL_PASSTHROUGH_SYNC_PT : LIBXL_PASSTHROUGH_SHARE_PT;

Thanks,
Paul Durrant Sept. 16, 2019, 3:02 p.m. UTC | #3
> -----Original Message-----
> From: Anthony PERARD <anthony.perard@citrix.com>
> Sent: 16 September 2019 15:06
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: xen-devel@lists.xenproject.org; Jan Beulich <jbeulich@suse.com>; Christian Lindig
> <christian.lindig@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Wei Liu <wl@xen.org>; Andrew
> Cooper <Andrew.Cooper3@citrix.com>; George Dunlap <George.Dunlap@citrix.com>; Julien Grall
> <julien.grall@arm.com>; Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Tim (Xen.org) <tim@xen.org>; David Scott <dave@recoil.org>; Volodymyr
> Babchuk <Volodymyr_Babchuk@epam.com>; Roger Pau Monne <roger.pau@citrix.com>
> Subject: Re: [PATCH v12 6/6] introduce a 'passthrough' configuration option to xl.cfg...
> 
> On Mon, Sep 16, 2019 at 10:27:08AM +0100, Paul Durrant wrote:
> > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> > index 59dbcb50a0..7afae81432 100644
> > --- a/tools/libxl/libxl_create.c
> > +++ b/tools/libxl/libxl_create.c
> > @@ -30,6 +30,12 @@
> >  int libxl__domain_create_info_setdefault(libxl__gc *gc,
> >                                           libxl_domain_create_info *c_info)
> >  {
> > +    libxl_physinfo info;
> > +    int rc = libxl_get_physinfo(CTX, &info);
> > +
> > +    if (rc)
> > +            return rc;
> > +
> 
> I think this hunk would be more readable if it was written:
>     int rc;
> 
>     rc = libxl_get_physinfo(CTX, &info);
>     if (rc)
>         return rc;
> 
> Otherwise, the check for error is alone which might mean "we are in a
> callback function and check for rc passed by parameter" or that it's a
> stray check.
> 
> > @@ -62,6 +62,12 @@ int libxl__domain_create_info_setdefault(libxl__gc *gc,
> >      if (!c_info->ssidref)
> >          c_info->ssidref = SECINITSID_DOMU;
> >
> > +    if (info.cap_hvm_directio) {
> > +        c_info->passthrough = ((c_info->type == LIBXL_DOMAIN_TYPE_PV) ||
> > +                               !info.cap_iommu_hap_pt_share) ?
> > +            LIBXL_PASSTHROUGH_SYNC_PT : LIBXL_PASSTHROUGH_SHARE_PT;
> > +    }
> 
> I don't think you can do that. This will overwrite the value that was in
> c_info before (that a user as set). The _setdefault() function is only
> called after c_info have been filled by users of the libxl, it only has
> to change the value if it was the default.

Oh, what a useful semantic.

> c_info->passthrough has no
> default value, so it's not possible to know what a user wants.
> 
> What about adding "default"==0 in libxl_passthrough enum?

I guess that's probably the only option, but there is still the problem of the page table memory overhead. I guess I'd have to assume the worst case for 'default'. That separation between create info and build info is really unhelpful.

> 
> >      return 0;
> >  }
> >
> > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> > index d52c63b6b0..22f05711e3 100644
> > --- a/tools/libxl/libxl_types.idl
> > +++ b/tools/libxl/libxl_types.idl
> > @@ -263,6 +263,12 @@ libxl_vkb_backend = Enumeration("vkb_backend", [
> >      (2, "LINUX")
> >      ])
> >
> > +libxl_passthrough = Enumeration("passthrough", [
> > +    (0, "disabled"),
> > +    (1, "sync_pt"),
> > +    (2, "share_pt"),
> > +    ])
> > +
> >  #
> >  # Complex libxl types
> >  #
> > diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
> > index 293f5f730e..4b2baa0403 100644
> > --- a/tools/xl/xl_parse.c
> > +++ b/tools/xl/xl_parse.c
> > @@ -1461,6 +1461,107 @@ void parse_config_data(const char *config_source,
> > +    if (xlu_cfg_get_string(config, "passthrough", &buf, 0)) {
> > +        buf = (d_config->num_pcidevs || d_config->num_dtdevs)
> > +            ? "enabled" : "disabled";
> > +    }
> > +
> > +    if (!strncmp(buf, "enabled", strlen(buf))) {
> 
> Do you intend to have "passthrough=''" been the equivalent of
> "passthrough='enabled'" in the config file ?
> (same with "e", "en", "ena", ...)

No, I guess not; I'll fix that.

  Paul
Julien Grall Sept. 17, 2019, 9:07 a.m. UTC | #4
Hi Paul,

On 9/16/19 10:27 AM, Paul Durrant wrote:
> ...and hence the ability to disable IOMMU mappings, and control EPT
> sharing.
> 
> This patch introduces a new 'libxl_passthrough' enumeration into
> libxl_domain_create_info. The value will be set by xl either when it parses
> a new 'passthrough' option in xl.cfg, or implicitly if there is passthrough
> hardware specified for the domain.
> 
> If the value of the passthrough configuration option is 'disabled' then
> the XEN_DOMCTL_CDF_iommu flag will be clear in the xen_domctl_createdomain
> flags, thus allowing the toolstack to control whether the domain gets
> IOMMU mappings or not (where previously they were globally set).
> 
> If the value of the passthrough configuration option is 'sync_pt' then
> a new 'iommu_opts' field in xen_domctl_createdomain will be set with the
> value XEN_DOMCTL_IOMMU_no_sharept. This will override the global default
> set in iommu_hap_pt_share, thus allowing the toolstack to control whether
> EPT sharing is used for the domain.
> 
> If the value of passthrough is 'enabled' then xl will choose an appropriate
> default according to the type of domain and hardware support.
> 
> NOTE: The 'iommu_memkb' overhead in libxl_domain_build_info will now only
>        be set if passthrough is 'sync_pt' (or xl has chosen this mode as
>        a default).
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> Acked-by: Christian Lindig <christian.lindig@citrix.com>
> ---
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Wei Liu <wl@xen.org>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: George Dunlap <George.Dunlap@eu.citrix.com>
> Cc: Julien Grall <julien.grall@arm.com>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Tim Deegan <tim@xen.org>
> Cc: Anthony PERARD <anthony.perard@citrix.com>
> Cc: David Scott <dave@recoil.org>
> Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Cc: "Roger Pau Monné" <roger.pau@citrix.com>
> 
> Sending just this patch since the rest of the series is unchanged from
> v11(.1).
> 
> Previously part of series https://lists.xenproject.org/archives/html/xen-devel/2019-07/msg02267.html
> 
> v12:
>   - Set passthrough default in libxl
> 
> v11:
>   - Fixed abi-check runes
> 
> v10:
>   - Added abi-check runes
> 
> v9:
>   - Added the passthrough='enabled' option to xl
>   - One cosmetic change in xen
>   - Assume Jan's R-b stands since non-cosmetic changes are only in the
>     toolstack
> 
> v7:
>   - Added missing breaks
>   - Added missing ocaml binding changes
> 
> v6:
>   - Remove the libxl_physinfo() call since it's usefulness is limited to x86
> 
> v5:
>   - Expand xen_domctl_createdomain flags field and hence bump interface
>     version
>   - Fix spelling mistakes in context line
> ---
>   docs/man/xl.cfg.5.pod.in            |  56 ++++++++++
>   tools/libxl/libxl.h                 |   9 ++
>   tools/libxl/libxl_create.c          |  27 +++--
>   tools/libxl/libxl_types.idl         |   7 ++
>   tools/ocaml/libs/xc/xenctrl.ml      |   4 +
>   tools/ocaml/libs/xc/xenctrl.mli     |   5 +
>   tools/ocaml/libs/xc/xenctrl_stubs.c |  17 ++-
>   tools/xl/xl_parse.c                 | 158 ++++++++++++++++++----------
>   xen/arch/arm/domain.c               |  10 +-
>   xen/arch/x86/domain.c               |   2 +-
>   xen/common/domain.c                 |   7 ++
>   xen/common/domctl.c                 |  13 ---
>   xen/drivers/passthrough/iommu.c     |  13 ++-
>   xen/include/public/domctl.h         |  10 +-
>   xen/include/xen/iommu.h             |  15 +--
>   15 files changed, 266 insertions(+), 87 deletions(-)
> 
> diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> index c99d40307e..154d847fb9 100644
> --- a/docs/man/xl.cfg.5.pod.in
> +++ b/docs/man/xl.cfg.5.pod.in
> @@ -605,6 +605,62 @@ option should only be used with a trusted device tree.
>   Note that the partial device tree should avoid using the phandle 65000
>   which is reserved by the toolstack.
>   
> +=item B<passthrough="STRING">
> +
> +Specify whether IOMMU mappings are enabled for the domain and hence whether
> +it will be enabled for passthrough hardware. Valid values for this option
> +are:
> +
> +=over 4
> +
> +=item B<disabled>
> +
> +IOMMU mappings are disabled for the domain and so hardware may not be
> +passed through.
> +
> +This option is the default if no passthrough hardware is specified in the
> +domain's configuration.
> +
> +=item B<sync_pt>
> +
> +This option means that IOMMU mappings will be synchronized with the
> +domain's P2M table as follows:
> +
> +For a PV domain, all writable pages assigned to the domain are identity
> +mapped by MFN in the IOMMU page table. Thus a device driver running in the
> +domain may program passthrough hardware for DMA using MFN values
> +(i.e. host/machine frame numbers) looked up in its P2M.
> +
> +For an HVM domain, all non-foreign RAM pages present in its P2M will be

Apologies to make the comment this late (it probably can be addressed in 
a follow-up if the patch is good to go).

 From the user point of view, they have the choice between HVM and PVH.
The fact that PVH are based on HVM domain is a Xen internal (actually we 
were using PV for Àrm in the past).

So we may want to say HVM/PVH here and other instance below.

Other than that the Arm changes looks good to me:

Acked-by: Julien Grall <julien.grall@arm.com>

Cheers,
Paul Durrant Sept. 18, 2019, 8:02 a.m. UTC | #5
> -----Original Message-----
> From: George Dunlap <george.dunlap@citrix.com>
> Sent: 16 September 2019 11:15
> To: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org
> Cc: Jan Beulich <jbeulich@suse.com>; Christian Lindig <christian.lindig@citrix.com>; Ian Jackson
> <Ian.Jackson@citrix.com>; Wei Liu <wl@xen.org>; Andrew Cooper <Andrew.Cooper3@citrix.com>; George
> Dunlap <George.Dunlap@citrix.com>; Julien Grall <julien.grall@arm.com>; Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com>; Stefano Stabellini <sstabellini@kernel.org>; Tim (Xen.org) <tim@xen.org>;
> Anthony Perard <anthony.perard@citrix.com>; David Scott <dave@recoil.org>; Volodymyr Babchuk
> <Volodymyr_Babchuk@epam.com>; Roger Pau Monne <roger.pau@citrix.com>
> Subject: Re: [PATCH v12 6/6] introduce a 'passthrough' configuration option to xl.cfg...
> 
> On 9/16/19 10:27 AM, Paul Durrant wrote:
> > ...and hence the ability to disable IOMMU mappings, and control EPT
> > sharing.
> >
> > This patch introduces a new 'libxl_passthrough' enumeration into
> > libxl_domain_create_info. The value will be set by xl either when it parses
> > a new 'passthrough' option in xl.cfg, or implicitly if there is passthrough
> > hardware specified for the domain.
> >
> > If the value of the passthrough configuration option is 'disabled' then
> > the XEN_DOMCTL_CDF_iommu flag will be clear in the xen_domctl_createdomain
> > flags, thus allowing the toolstack to control whether the domain gets
> > IOMMU mappings or not (where previously they were globally set).
> >
> > If the value of the passthrough configuration option is 'sync_pt' then
> > a new 'iommu_opts' field in xen_domctl_createdomain will be set with the
> > value XEN_DOMCTL_IOMMU_no_sharept. This will override the global default
> > set in iommu_hap_pt_share, thus allowing the toolstack to control whether
> > EPT sharing is used for the domain.
> >
> > If the value of passthrough is 'enabled' then xl will choose an appropriate
> > default according to the type of domain and hardware support.
> 
> Minor suggestion: I prefer using a word like "auto" when you're letting
> the computer decide something.

Not sure I like that. I'll wait for a second opinion on that one.

> 
> I'd also...
> 
> > diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> > index c99d40307e..154d847fb9 100644
> > --- a/docs/man/xl.cfg.5.pod.in
> > +++ b/docs/man/xl.cfg.5.pod.in
> > @@ -605,6 +605,62 @@ option should only be used with a trusted device tree.
> >  Note that the partial device tree should avoid using the phandle 65000
> >  which is reserved by the toolstack.
> >
> > +=item B<passthrough="STRING">
> > +
> > +Specify whether IOMMU mappings are enabled for the domain and hence whether
> > +it will be enabled for passthrough hardware. Valid values for this option
> > +are:
> > +
> > +=over 4
> > +
> > +=item B<disabled>
> > +
> > +IOMMU mappings are disabled for the domain and so hardware may not be
> > +passed through.
> > +
> > +This option is the default if no passthrough hardware is specified in the
> > +domain's configuration.
> > +
> > +=item B<sync_pt>
> > +
> > +This option means that IOMMU mappings will be synchronized with the
> > +domain's P2M table as follows:
> > +
> > +For a PV domain, all writable pages assigned to the domain are identity
> > +mapped by MFN in the IOMMU page table. Thus a device driver running in the
> > +domain may program passthrough hardware for DMA using MFN values
> > +(i.e. host/machine frame numbers) looked up in its P2M.
> > +
> > +For an HVM domain, all non-foreign RAM pages present in its P2M will be
> > +mapped by GFN in the IOMMU page table. Thus a device driver running in the
> > +domain may program passthrough hardware using GFN values (i.e. guest
> > +physical frame numbers) without any further translation.
> > +
> > +This option is not currently available on Arm.
> > +
> > +=item B<share_pt>
> > +
> > +This option is unavailable for a PV domain. For an HVM domain, this option
> > +means that the IOMMU will be programmed to directly reference the domain's
> > +P2M table as its page table. From the point of view of a device driver
> > +running in the domain this is functionally equivalent to B<sync_pt> but
> > +places less load on the hypervisor and so should generally be selected in
> > +preference. However, the availability of this option is hardware specific.
> > +If B<xl info> reports B<virt_caps> containing B<iommu_hap_pt_share> then
> > +this option may be used.
> > +
> > +=item B<enabled>
> > +
> > +This option enables IOMMU mappings and selects an appropriate default
> > +operating mode. For HVM domains running on platforms where the option is
> > +available, this is equivalent to B<share_pt>. Otherwise, and also for PV
> > +domains, this options is equivalent to B<sync_pt>.
> 
> ...put the option we want / expect people nearer the top (either first
> or second).
> 

Ok, I'll do that.

  Paul

> Thanks,
>  -George
diff mbox series

Patch

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index c99d40307e..154d847fb9 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -605,6 +605,62 @@  option should only be used with a trusted device tree.
 Note that the partial device tree should avoid using the phandle 65000
 which is reserved by the toolstack.
 
+=item B<passthrough="STRING">
+
+Specify whether IOMMU mappings are enabled for the domain and hence whether
+it will be enabled for passthrough hardware. Valid values for this option
+are:
+
+=over 4
+
+=item B<disabled>
+
+IOMMU mappings are disabled for the domain and so hardware may not be
+passed through.
+
+This option is the default if no passthrough hardware is specified in the
+domain's configuration.
+
+=item B<sync_pt>
+
+This option means that IOMMU mappings will be synchronized with the
+domain's P2M table as follows:
+
+For a PV domain, all writable pages assigned to the domain are identity
+mapped by MFN in the IOMMU page table. Thus a device driver running in the
+domain may program passthrough hardware for DMA using MFN values
+(i.e. host/machine frame numbers) looked up in its P2M.
+
+For an HVM domain, all non-foreign RAM pages present in its P2M will be
+mapped by GFN in the IOMMU page table. Thus a device driver running in the
+domain may program passthrough hardware using GFN values (i.e. guest
+physical frame numbers) without any further translation.
+
+This option is not currently available on Arm.
+
+=item B<share_pt>
+
+This option is unavailable for a PV domain. For an HVM domain, this option
+means that the IOMMU will be programmed to directly reference the domain's
+P2M table as its page table. From the point of view of a device driver
+running in the domain this is functionally equivalent to B<sync_pt> but
+places less load on the hypervisor and so should generally be selected in
+preference. However, the availability of this option is hardware specific.
+If B<xl info> reports B<virt_caps> containing B<iommu_hap_pt_share> then
+this option may be used.
+
+=item B<enabled>
+
+This option enables IOMMU mappings and selects an appropriate default
+operating mode. For HVM domains running on platforms where the option is
+available, this is equivalent to B<share_pt>. Otherwise, and also for PV
+domains, this options is equivalent to B<sync_pt>.
+
+This option is the default if passthrough hardware is specified in the
+domain's configuration.
+
+=back
+
 =back
 
 =head2 Devices
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 12545130df..e4b9c539b6 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -415,6 +415,15 @@ 
  */
 #define LIBXL_HAVE_BUILDINFO_IOMMU_MEMKB 1
 
+/*
+ * LIBXL_HAVE_CREATEINFO_PASSTHROUGH indicates that
+ * libxl_domain_create_info has a passthrough field (which is a
+ * libxl_passthrough enumeration) that indicates whether device pass-
+ * through is enabled for the domain and, if so, whether the IOMMU and
+ * HAP page tables may be shared or not.
+ */
+#define LIBXL_HAVE_CREATEINFO_PASSTHROUGH 1
+
 /*
  * libxl ABI compatibility
  *
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 59dbcb50a0..7afae81432 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -30,6 +30,12 @@ 
 int libxl__domain_create_info_setdefault(libxl__gc *gc,
                                          libxl_domain_create_info *c_info)
 {
+    libxl_physinfo info;
+    int rc = libxl_get_physinfo(CTX, &info);
+
+    if (rc)
+            return rc;
+
     if (!c_info->type) {
         LOG(ERROR, "domain type unspecified");
         return ERROR_INVAL;
@@ -38,12 +44,6 @@  int libxl__domain_create_info_setdefault(libxl__gc *gc,
     libxl__arch_domain_create_info_setdefault(gc, c_info);
 
     if (c_info->type != LIBXL_DOMAIN_TYPE_PV) {
-        libxl_physinfo info;
-        int rc = libxl_get_physinfo(CTX, &info);
-
-        if (rc)
-            return rc;
-
         if (info.cap_hap)
             libxl_defbool_setdefault(&c_info->hap, true);
         else if (info.cap_shadow)
@@ -62,6 +62,12 @@  int libxl__domain_create_info_setdefault(libxl__gc *gc,
     if (!c_info->ssidref)
         c_info->ssidref = SECINITSID_DOMU;
 
+    if (info.cap_hvm_directio) {
+        c_info->passthrough = ((c_info->type == LIBXL_DOMAIN_TYPE_PV) ||
+                               !info.cap_iommu_hap_pt_share) ?
+            LIBXL_PASSTHROUGH_SYNC_PT : LIBXL_PASSTHROUGH_SHARE_PT;
+    }
+
     return 0;
 }
 
@@ -578,6 +584,15 @@  int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
                 libxl_defbool_val(info->oos) ? 0 : XEN_DOMCTL_CDF_oos_off;
         }
 
+        LOG(DETAIL, "passthrough: %s",
+            libxl_passthrough_to_string(info->passthrough));
+
+        if (info->passthrough != LIBXL_PASSTHROUGH_DISABLED)
+            create.flags |= XEN_DOMCTL_CDF_iommu;
+
+        if (info->passthrough == LIBXL_PASSTHROUGH_SYNC_PT)
+            create.iommu_opts |= XEN_DOMCTL_IOMMU_no_sharept;
+
         /* Ultimately, handle is an array of 16 uint8_t, same as uuid */
         libxl_uuid_copy(ctx, (libxl_uuid *)&create.handle, &info->uuid);
 
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index d52c63b6b0..22f05711e3 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -263,6 +263,12 @@  libxl_vkb_backend = Enumeration("vkb_backend", [
     (2, "LINUX")
     ])
 
+libxl_passthrough = Enumeration("passthrough", [
+    (0, "disabled"),
+    (1, "sync_pt"),
+    (2, "share_pt"),
+    ])
+
 #
 # Complex libxl types
 #
@@ -408,6 +414,7 @@  libxl_domain_create_info = Struct("domain_create_info",[
     ("pool_name",    string),
     ("run_hotplug_scripts",libxl_defbool),
     ("driver_domain",libxl_defbool),
+    ("passthrough",  libxl_passthrough),
     ], dir=DIR_IN)
 
 libxl_domain_restore_params = Struct("domain_restore_params", [
diff --git a/tools/ocaml/libs/xc/xenctrl.ml b/tools/ocaml/libs/xc/xenctrl.ml
index de4bae6012..e00a74d48d 100644
--- a/tools/ocaml/libs/xc/xenctrl.ml
+++ b/tools/ocaml/libs/xc/xenctrl.ml
@@ -65,11 +65,15 @@  type domain_create_flag =
 	| CDF_XS_DOMAIN
 	| CDF_IOMMU
 
+type domain_create_iommu_opts =
+	| IOMMU_NO_SHAREPT
+
 type domctl_create_config =
 {
 	ssidref: int32;
 	handle: string;
 	flags: domain_create_flag list;
+	iommu_opts: domain_create_iommu_opts list;
 	max_vcpus: int;
 	max_evtchn_port: int;
 	max_grant_frames: int;
diff --git a/tools/ocaml/libs/xc/xenctrl.mli b/tools/ocaml/libs/xc/xenctrl.mli
index c885e75895..0e7049d708 100644
--- a/tools/ocaml/libs/xc/xenctrl.mli
+++ b/tools/ocaml/libs/xc/xenctrl.mli
@@ -57,10 +57,15 @@  type domain_create_flag =
   | CDF_OOS_OFF
   | CDF_XS_DOMAIN
   | CDF_IOMMU
+
+type domain_create_iommu_opts =
+  | IOMMU_NO_SHAREPT
+
 type domctl_create_config = {
   ssidref: int32;
   handle: string;
   flags: domain_create_flag list;
+  iommu_opts: domain_create_iommu_opts list;
   max_vcpus: int;
   max_evtchn_port: int;
   max_grant_frames: int;
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
index 0cdd873599..48f39f81d5 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -190,11 +190,12 @@  CAMLprim value stub_xc_domain_create(value xch, value config)
 #define VAL_SSIDREF             Field(config, 0)
 #define VAL_HANDLE              Field(config, 1)
 #define VAL_FLAGS               Field(config, 2)
-#define VAL_MAX_VCPUS           Field(config, 3)
-#define VAL_MAX_EVTCHN_PORT     Field(config, 4)
-#define VAL_MAX_GRANT_FRAMES    Field(config, 5)
-#define VAL_MAX_MAPTRACK_FRAMES Field(config, 6)
-#define VAL_ARCH                Field(config, 7)
+#define VAL_IOMMU_OPTS          Field(config, 3)
+#define VAL_MAX_VCPUS           Field(config, 4)
+#define VAL_MAX_EVTCHN_PORT     Field(config, 5)
+#define VAL_MAX_GRANT_FRAMES    Field(config, 6)
+#define VAL_MAX_MAPTRACK_FRAMES Field(config, 7)
+#define VAL_ARCH                Field(config, 8)
 
 	uint32_t domid = 0;
 	int result;
@@ -213,6 +214,11 @@  CAMLprim value stub_xc_domain_create(value xch, value config)
 		/* ! XEN_DOMCTL_CDF_ XEN_DOMCTL_CDF_MAX max */
 		(VAL_FLAGS);
 
+	cfg.iommu_opts = ocaml_list_to_c_bitmap
+		/* ! domain_create_iommu_opts IOMMU_ lc */
+		/* ! XEN_DOMCTL_IOMMU_ XEN_DOMCTL_IOMMU_MAX max */
+		(VAL_IOMMU_OPTS);
+
 	arch_domconfig = Field(VAL_ARCH, 0);
 	switch ( Tag_val(VAL_ARCH) )
 	{
@@ -247,6 +253,7 @@  CAMLprim value stub_xc_domain_create(value xch, value config)
 #undef VAL_MAX_GRANT_FRAMES
 #undef VAL_MAX_EVTCHN_PORT
 #undef VAL_MAX_VCPUS
+#undef VAL_IOMMU_OPTS
 #undef VAL_FLAGS
 #undef VAL_HANDLE
 #undef VAL_SSIDREF
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 293f5f730e..4b2baa0403 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -1461,6 +1461,107 @@  void parse_config_data(const char *config_source,
         exit(1);
     }
 
+    if (!xlu_cfg_get_list (config, "pci", &pcis, 0, 0)) {
+        d_config->num_pcidevs = 0;
+        d_config->pcidevs = NULL;
+        for(i = 0; (buf = xlu_cfg_get_listitem (pcis, i)) != NULL; i++) {
+            libxl_device_pci *pcidev;
+
+            pcidev = ARRAY_EXTEND_INIT_NODEVID(d_config->pcidevs,
+                                               d_config->num_pcidevs,
+                                               libxl_device_pci_init);
+            pcidev->msitranslate = pci_msitranslate;
+            pcidev->power_mgmt = pci_power_mgmt;
+            pcidev->permissive = pci_permissive;
+            pcidev->seize = pci_seize;
+            /*
+             * Like other pci option, the per-device policy always follows
+             * the global policy by default.
+             */
+            pcidev->rdm_policy = b_info->u.hvm.rdm.policy;
+            e = xlu_pci_parse_bdf(config, pcidev, buf);
+            if (e) {
+                fprintf(stderr,
+                        "unable to parse PCI BDF `%s' for passthrough\n",
+                        buf);
+                exit(-e);
+            }
+        }
+        if (d_config->num_pcidevs && c_info->type == LIBXL_DOMAIN_TYPE_PV)
+            libxl_defbool_set(&b_info->u.pv.e820_host, true);
+    }
+
+    if (!xlu_cfg_get_list (config, "dtdev", &dtdevs, 0, 0)) {
+        d_config->num_dtdevs = 0;
+        d_config->dtdevs = NULL;
+        for (i = 0; (buf = xlu_cfg_get_listitem(dtdevs, i)) != NULL; i++) {
+            libxl_device_dtdev *dtdev;
+
+            dtdev = ARRAY_EXTEND_INIT_NODEVID(d_config->dtdevs,
+                                              d_config->num_dtdevs,
+                                              libxl_device_dtdev_init);
+
+            dtdev->path = strdup(buf);
+            if (dtdev->path == NULL) {
+                fprintf(stderr, "unable to duplicate string for dtdevs\n");
+                exit(-1);
+            }
+        }
+    }
+
+    if (xlu_cfg_get_string(config, "passthrough", &buf, 0)) {
+        buf = (d_config->num_pcidevs || d_config->num_dtdevs)
+            ? "enabled" : "disabled";
+    }
+
+    if (!strncmp(buf, "enabled", strlen(buf))) {
+        /* Choose a suitable default */
+        c_info->passthrough =
+            (c_info->type == LIBXL_DOMAIN_TYPE_PV) || !iommu_hap_pt_share
+            ? LIBXL_PASSTHROUGH_SYNC_PT : LIBXL_PASSTHROUGH_SHARE_PT;
+    } else {
+        libxl_passthrough o;
+
+        e = libxl_passthrough_from_string(buf, &o);
+        if (e) {
+            fprintf(stderr,
+                    "ERROR: unknown passthrough option '%s'\n",
+                    buf);
+            exit(-ERROR_FAIL);
+        }
+
+        switch (o) {
+        case LIBXL_PASSTHROUGH_DISABLED:
+            if (d_config->num_pcidevs || d_config->num_dtdevs) {
+                fprintf(stderr,
+                        "ERROR: passthrough disabled but devices are specified\n");
+                exit(-ERROR_FAIL);
+            }
+            break;
+        case LIBXL_PASSTHROUGH_SHARE_PT:
+            if (c_info->type == LIBXL_DOMAIN_TYPE_PV) {
+                fprintf(stderr,
+                        "ERROR: passthrough=\"share_pt\" not valid for PV domain\n");
+                exit(-ERROR_FAIL);
+            } else if (!iommu_hap_pt_share) {
+                fprintf(stderr,
+                        "ERROR: passthrough=\"share_pt\" not supported on this platform\n");
+                exit(-ERROR_FAIL);
+            }
+            break;
+        case LIBXL_PASSTHROUGH_SYNC_PT:
+            break;
+        }
+
+        c_info->passthrough = o;
+    }
+
+    if ((c_info->passthrough != LIBXL_PASSTHROUGH_DISABLED) && !iommu_enabled) {
+        fprintf(stderr,
+                "ERROR: passthrough not supported on this platform\n");
+        exit(-ERROR_FAIL);
+    }
+
     /* libxl_get_required_shadow_memory() and
      * libxl_get_required_iommu_memory() must be called after final values
      * (default or specified) for vcpus and memory are set, because the
@@ -1470,11 +1571,10 @@  void parse_config_data(const char *config_source,
         : libxl_get_required_shadow_memory(b_info->max_memkb,
                                            b_info->max_vcpus);
 
-    /* No IOMMU reservation is needed if either the IOMMU is disabled or it
-     * can share the P2M. */
-    b_info->iommu_memkb = (!iommu_enabled || iommu_hap_pt_share)
-        ? 0
-        : libxl_get_required_iommu_memory(b_info->max_memkb);
+    /* No IOMMU reservation is needed if passthrough mode is not 'sync_pt' */
+    b_info->iommu_memkb = (c_info->passthrough == LIBXL_PASSTHROUGH_SYNC_PT)
+        ? libxl_get_required_iommu_memory(b_info->max_memkb)
+        : 0;
 
     xlu_cfg_get_defbool(config, "nomigrate", &b_info->disable_migrate, 0);
 
@@ -2298,54 +2398,6 @@  skip_vfb:
         }
     }
 
-    if (!xlu_cfg_get_list (config, "pci", &pcis, 0, 0)) {
-        d_config->num_pcidevs = 0;
-        d_config->pcidevs = NULL;
-        for(i = 0; (buf = xlu_cfg_get_listitem (pcis, i)) != NULL; i++) {
-            libxl_device_pci *pcidev;
-
-            pcidev = ARRAY_EXTEND_INIT_NODEVID(d_config->pcidevs,
-                                               d_config->num_pcidevs,
-                                               libxl_device_pci_init);
-            pcidev->msitranslate = pci_msitranslate;
-            pcidev->power_mgmt = pci_power_mgmt;
-            pcidev->permissive = pci_permissive;
-            pcidev->seize = pci_seize;
-            /*
-             * Like other pci option, the per-device policy always follows
-             * the global policy by default.
-             */
-            pcidev->rdm_policy = b_info->u.hvm.rdm.policy;
-            e = xlu_pci_parse_bdf(config, pcidev, buf);
-            if (e) {
-                fprintf(stderr,
-                        "unable to parse PCI BDF `%s' for passthrough\n",
-                        buf);
-                exit(-e);
-            }
-        }
-        if (d_config->num_pcidevs && c_info->type == LIBXL_DOMAIN_TYPE_PV)
-            libxl_defbool_set(&b_info->u.pv.e820_host, true);
-    }
-
-    if (!xlu_cfg_get_list (config, "dtdev", &dtdevs, 0, 0)) {
-        d_config->num_dtdevs = 0;
-        d_config->dtdevs = NULL;
-        for (i = 0; (buf = xlu_cfg_get_listitem(dtdevs, i)) != NULL; i++) {
-            libxl_device_dtdev *dtdev;
-
-            dtdev = ARRAY_EXTEND_INIT_NODEVID(d_config->dtdevs,
-                                              d_config->num_dtdevs,
-                                              libxl_device_dtdev_init);
-
-            dtdev->path = strdup(buf);
-            if (dtdev->path == NULL) {
-                fprintf(stderr, "unable to duplicate string for dtdevs\n");
-                exit(-1);
-            }
-        }
-    }
-
     if (!xlu_cfg_get_list(config, "usbctrl", &usbctrls, 0, 0)) {
         d_config->num_usbctrls = 0;
         d_config->usbctrls = NULL;
diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index ae13e47e86..61d35cd120 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -617,6 +617,14 @@  int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
         return -EINVAL;
     }
 
+    /* The P2M table must always be shared between the CPU and the IOMMU */
+    if ( config->iommu_opts & XEN_DOMCTL_IOMMU_no_sharept )
+    {
+        dprintk(XENLOG_INFO,
+                "Unsupported iommu option: XEN_DOMCTL_IOMMU_no_sharept\n");
+        return -EINVAL;
+    }
+
     /* Fill in the native GIC version, passed back to the toolstack. */
     if ( config->arch.gic_version == XEN_DOMCTL_CONFIG_GIC_NATIVE )
     {
@@ -677,7 +685,7 @@  int arch_domain_create(struct domain *d,
     ASSERT(config != NULL);
 
     /* p2m_init relies on some value initialized by the IOMMU subsystem */
-    if ( (rc = iommu_domain_init(d)) != 0 )
+    if ( (rc = iommu_domain_init(d, config->iommu_opts)) != 0 )
         goto fail;
 
     if ( (rc = p2m_init(d)) != 0 )
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 4b0ad5e15d..fb8b397be2 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -611,7 +611,7 @@  int arch_domain_create(struct domain *d,
     if ( (rc = init_domain_irq_mapping(d)) != 0 )
         goto fail;
 
-    if ( (rc = iommu_domain_init(d)) != 0 )
+    if ( (rc = iommu_domain_init(d, config->iommu_opts)) != 0 )
         goto fail;
 
     psr_domain_init(d);
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 4681f29c8b..0733ee8b0a 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -308,6 +308,13 @@  static int sanitise_domain_config(struct xen_domctl_createdomain *config)
         return -EINVAL;
     }
 
+    if ( !(config->flags & XEN_DOMCTL_CDF_iommu) && config->iommu_opts )
+    {
+        dprintk(XENLOG_INFO,
+                "IOMMU options specified but IOMMU not enabled\n");
+        return -EINVAL;
+    }
+
     if ( config->max_vcpus < 1 )
     {
         dprintk(XENLOG_INFO, "No vCPUS\n");
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 5dcfe3c8f6..6e6e9b9866 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -515,19 +515,6 @@  long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
             rover = dom;
         }
 
-        /*
-         * For now, make sure the createdomain IOMMU flag is set if the
-         * IOMMU is enabled. When the flag comes under toolstack control
-         * this can go away.
-         */
-        if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_iommu )
-        {
-            ASSERT_UNREACHABLE();
-            return -EINVAL;
-        }
-        if ( iommu_enabled )
-            op->u.createdomain.flags |= XEN_DOMCTL_CDF_iommu;
-
         d = domain_create(dom, &op->u.createdomain, false);
         if ( IS_ERR(d) )
         {
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index a49346394d..4171c3cf6f 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -168,7 +168,7 @@  static void __hwdom_init check_hwdom_reqs(struct domain *d)
     iommu_hwdom_strict = true;
 }
 
-int iommu_domain_init(struct domain *d)
+int iommu_domain_init(struct domain *d, unsigned int opts)
 {
     struct domain_iommu *hd = dom_iommu(d);
     int ret = 0;
@@ -192,6 +192,15 @@  int iommu_domain_init(struct domain *d)
     if ( is_hardware_domain(d) )
         check_hwdom_reqs(d); /* may modify iommu_hwdom_strict */
 
+    /*
+     * Use shared page tables for HAP and IOMMU if the global option
+     * is enabled (from which we can infer the h/w is capable) and
+     * the domain options do not disallow it. HAP must, of course, also
+     * be enabled.
+     */
+    hd->hap_pt_share = hap_enabled(d) && iommu_hap_pt_share &&
+        !(opts & XEN_DOMCTL_IOMMU_no_sharept);
+
     /*
      * NB: 'relaxed' h/w domains don't need the IOMMU mappings to be kept
      *     in-sync with their assigned pages because all host RAM will be
@@ -200,6 +209,8 @@  int iommu_domain_init(struct domain *d)
     if ( !is_hardware_domain(d) || iommu_hwdom_strict )
         hd->need_sync = !iommu_use_hap_pt(d);
 
+    ASSERT(!(hd->need_sync && hd->hap_pt_share));
+
     return 0;
 }
 
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 1b3176adb5..ba84aea6ab 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -38,7 +38,7 @@ 
 #include "hvm/save.h"
 #include "memory.h"
 
-#define XEN_DOMCTL_INTERFACE_VERSION 0x00000011
+#define XEN_DOMCTL_INTERFACE_VERSION 0x00000012
 
 /*
  * NB. xen_domctl.domain is an IN/OUT parameter for this operation.
@@ -73,6 +73,14 @@  struct xen_domctl_createdomain {
 
     uint32_t flags;
 
+#define _XEN_DOMCTL_IOMMU_no_sharept  0
+#define XEN_DOMCTL_IOMMU_no_sharept   (1U << _XEN_DOMCTL_IOMMU_no_sharept)
+
+/* Max XEN_DOMCTL_IOMMU_* constant.  Used for ABI checking. */
+#define XEN_DOMCTL_IOMMU_MAX XEN_DOMCTL_IOMMU_no_sharept
+
+    uint32_t iommu_opts;
+
     /*
      * Various domain limits, which impact the quantity of resources (global
      * mapping space, xenheap, etc) a guest may consume.
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 8b985f92b0..ad556f7d54 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -75,7 +75,7 @@  extern unsigned int iommu_dev_iotlb_timeout;
 int iommu_setup(void);
 int iommu_hardware_setup(void);
 
-int iommu_domain_init(struct domain *d);
+int iommu_domain_init(struct domain *d, unsigned int opts);
 void iommu_hwdom_init(struct domain *d);
 void iommu_domain_destroy(struct domain *d);
 
@@ -269,10 +269,14 @@  struct domain_iommu {
     /* Features supported by the IOMMU */
     DECLARE_BITMAP(features, IOMMU_FEAT_count);
 
+    /* Does the guest share HAP mapping with the IOMMU? */
+    bool hap_pt_share;
+
     /*
-     * Does the guest reqire mappings to be synchonized, to maintain
-     * the default dfn == pfn map. (See comment on dfn at the top of
-     * include/xen/mm.h).
+     * Does the guest require mappings to be synchronized, to maintain
+     * the default dfn == pfn map? (See comment on dfn at the top of
+     * include/xen/mm.h). Note that hap_pt_share == false does not
+     * necessarily imply this is true.
      */
     bool need_sync;
 };
@@ -282,8 +286,7 @@  struct domain_iommu {
 #define iommu_clear_feature(d, f) clear_bit(f, dom_iommu(d)->features)
 
 /* Are we using the domain P2M table as its IOMMU pagetable? */
-#define iommu_use_hap_pt(d) \
-    (hap_enabled(d) && is_iommu_enabled(d) && iommu_hap_pt_share)
+#define iommu_use_hap_pt(d)       (dom_iommu(d)->hap_pt_share)
 
 /* Does the IOMMU pagetable need to be kept synchronized with the P2M */
 #ifdef CONFIG_HAS_PASSTHROUGH