diff mbox series

[6/6] introduce a 'passthrough' configuration option to xl.cfg...

Message ID 20190730134419.2739-7-paul.durrant@citrix.com (mailing list archive)
State New, archived
Headers show
Series per-domain IOMMU control | expand

Commit Message

Paul Durrant July 30, 2019, 1:44 p.m. UTC
...and hence the ability to disable IOMMU mappings, and control EPT
sharing.

This patch introduces a new 'libxl_passthrough' enumeration into
libxl_domain_create_info. The value will be set by xl either when it parses
a new 'passthrough' option in xl.cfg, or implicitly if there is passthrough
hardware specified for the domain.

If the value of the passthrough configuration option is 'disabled' then
the XEN_DOMCTL_CDF_iommu flag will be clear in the xen_domctl_createdomain
flags, thus allowing the toolstack to control whether the domain gets
IOMMU mappings or not (where previously they were globally set).

If the value of the passthrough configuration option is 'sync_pt' then
a new 'iommu_opts' field in xen_domctl_createdomain will be set with the
value XEN_DOMCTL_IOMMU_no_sharept. This will override the global default
set in iommu_hap_pt_share, thus allowing the toolstack to control whether
EPT sharing is used for the domain.

NOTE: The call to libxl_get_physinfo() in libxl__domain_make() is left
      in place to allow attempts to passthrough hardware on a hypervisor
      with disabled IOMMU support to be rejected early.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wl@xen.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Anthony PERARD <anthony.perard@citrix.com>
Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
 docs/man/xl.cfg.5.pod.in        | 52 +++++++++++++++++++++++++++++++++
 tools/libxl/libxl.h             |  5 ++++
 tools/libxl/libxl_create.c      | 16 +++++++++-
 tools/libxl/libxl_types.idl     |  7 +++++
 tools/xl/xl_parse.c             | 38 ++++++++++++++++++++++++
 xen/arch/arm/domain.c           | 11 ++++++-
 xen/arch/x86/domain.c           |  3 +-
 xen/common/domain.c             |  7 +++++
 xen/drivers/passthrough/iommu.c | 13 ++++++++-
 xen/include/public/domctl.h     |  6 +++-
 xen/include/xen/iommu.h         | 17 +++++++----
 11 files changed, 165 insertions(+), 10 deletions(-)

Comments

Jan Beulich Aug. 7, 2019, 12:12 p.m. UTC | #1
On 30.07.2019 15:44, Paul Durrant wrote:
> --- a/docs/man/xl.cfg.5.pod.in
> +++ b/docs/man/xl.cfg.5.pod.in
> @@ -605,6 +605,58 @@ option should only be used with a trusted device tree.
>   Note that the partial device tree should avoid using the phandle 65000
>   which is reserved by the toolstack.
>   
> +=item B<passthrough="STRING">
> +
> +Specify whether IOMMU mappings are enabled for the domain and hence whether
> +it will be enabled for passthrough hardware. Valid values for this option
> +are:
> +
> +=over 4
> +
> +=item B<disabled>
> +
> +IOMMU mappings are disabled for the domain and so hardware may not be
> +passed through.
> +
> +This option is the default if no passthrough hardware is specified
> +in the domain's configuration.
> +
> +=item B<sync_pt>
> +
> +This option means that IOMMU mappings will be synchronized with the
> +domain's P2M table as follows:
> +
> +For a PV domain, all writable pages assigned to the domain are identity
> +mapped by MFN in the IOMMU page table. Thus a device driver running in the
> +domain may program passthrough hardware for DMA using MFN values
> +(i.e. host/machine frame numbers) looked up in its P2M.
> +
> +For an HVM domain, all non-foreign RAM pages present in its P2M will be
> +mapped by GFN in the IOMMU page table. Thus a device driver running in the
> +domain may program passthrough hardware using GFN values (i.e. guest
> +physical frame numbers) without any further translation.
> +
> +This option is the default if the domain is PV and passthrough hardware
> +is specified in the configuration.
> +
> +This option is not available on Arm.
> +
> +=item B<share_pt>
> +
> +This option is unavailable for a PV domain. For an HVM domain, this option
> +means that the IOMMU will be programmed to directly reference the domain's
> +P2M table as its page table. From the point of view of a device driver
> +running in the domain this is functionally equivalent to B<sync_pt> but
> +places less load on the hypervisor and so should generally be selected in
> +preference. However, the availability of this option is hardware specific
> +and thus, if it is specified for a domain running on hardware that does
> +not allow it, B<sync_pt> will be used instead.
> +
> +This option is the default if the domain is HVM and passthrough hardware
> +is specified in the configuration.

Perhaps worthwhile to clarify right away that on AMD we'll always
fall back to sync_pt?

Btw, I've no idea what the convention in xl.cfg is, but in the
hypervisor I'd ask to use hyphens instead of underscores in the
option value strings.

> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -614,6 +614,14 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
>           return -EINVAL;
>       }
>   
> +    /* Always share P2M Table between the CPU and the IOMMU */
> +    if ( config->iommu_opts & XEN_DOMCTL_IOMMU_no_sharept )
> +    {
> +        dprintk(XENLOG_INFO,
> +                "Unsupported iommu option: XEN_DOMCTL_IOMMU_no_sharept\n");
> +        return -EINVAL;

Seeing our excessive use of EINVAL, I'm generally suggesting to
use almost anything else where EINVAL isn't clearly the right
one to use. EOPNOTSUPP would seem better here, for example, not
the least because I assume it is at least a hypothetically
valid setting (if it was implemented).

> @@ -176,6 +176,15 @@ int iommu_domain_init(struct domain *d)
>       if ( ret )
>           return ret;
>   
> +    /*
> +     * Use shared page tables for HAP and IOMMU if the global option
> +     * is enabled (from which we can infer the h/w is capable) and
> +     * the domain options do not disallow it. HAP must, of course, also
> +     * be enabled.
> +     */
> +    hd->hap_pt_share = hap_enabled(d) && iommu_hap_pt_share &&
> +        !(opts & XEN_DOMCTL_IOMMU_no_sharept);

The hap_enabled() again raises the question of whether this
builds for Arm.

> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -68,7 +68,11 @@ struct xen_domctl_createdomain {
>   #define _XEN_DOMCTL_CDF_iommu         5
>   #define XEN_DOMCTL_CDF_iommu          (1U<<_XEN_DOMCTL_CDF_iommu)
>   
> -    uint32_t flags;
> +    uint16_t flags;
> +
> +#define _XEN_DOMCTL_IOMMU_no_sharept  0
> +#define XEN_DOMCTL_IOMMU_no_sharept   (1U<<_XEN_DOMCTL_IOMMU_no_sharept)
> +    uint16_t iommu_opts;

Are we sure of this split into to 16-bit fields? With a
growing number of settings to be established at domain
creation I don't think the 11 remaining bits for the
"general" flags won't last very long. IOW if such a split,
then I think we'd better have two 32-bit fields.

> @@ -256,10 +256,18 @@ struct domain_iommu {
>       /* Features supported by the IOMMU */
>       DECLARE_BITMAP(features, IOMMU_FEAT_count);
>   
> +    /*
> +     * Does the guest share HAP mapping with the IOMMU? This is always
> +     * true for ARM systems and may be true for x86 systems where the
> +     * the hardware is capable.
> +     */
> +    bool hap_pt_share;
> +
>      /*
>       * Does the guest reqire mappings to be synchonized, to maintain
> -     * the default dfn == pfn map. (See comment on dfn at the top of
> -     * include/xen/mm.h).
> +     * the default dfn == pfn map? (See comment on dfn at the top of
> +     * include/xen/mm.h). Note that hap_pt_share == false does not
> +     * necessarily imply this is true.

Would you mind also correcting the two spelling mistakes in the
first line of the comment that you don't touch right now?

Jan
Paul Durrant Aug. 14, 2019, 10:40 a.m. UTC | #2
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 07 August 2019 13:13
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: xen-devel@lists.xenproject.org; Julien Grall <julien.grall@arm.com>; Andrew Cooper
> <Andrew.Cooper3@citrix.com>; Anthony Perard <anthony.perard@citrix.com>; Roger Pau Monne
> <roger.pau@citrix.com>; Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>; George Dunlap
> <George.Dunlap@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; Tim (Xen.org) <tim@xen.org>;
> Wei Liu <wl@xen.org>
> Subject: Re: [PATCH 6/6] introduce a 'passthrough' configuration option to xl.cfg...
> 
> On 30.07.2019 15:44, Paul Durrant wrote:
> > --- a/docs/man/xl.cfg.5.pod.in
> > +++ b/docs/man/xl.cfg.5.pod.in
> > @@ -605,6 +605,58 @@ option should only be used with a trusted device tree.
> >   Note that the partial device tree should avoid using the phandle 65000
> >   which is reserved by the toolstack.
> >
> > +=item B<passthrough="STRING">
> > +
> > +Specify whether IOMMU mappings are enabled for the domain and hence whether
> > +it will be enabled for passthrough hardware. Valid values for this option
> > +are:
> > +
> > +=over 4
> > +
> > +=item B<disabled>
> > +
> > +IOMMU mappings are disabled for the domain and so hardware may not be
> > +passed through.
> > +
> > +This option is the default if no passthrough hardware is specified
> > +in the domain's configuration.
> > +
> > +=item B<sync_pt>
> > +
> > +This option means that IOMMU mappings will be synchronized with the
> > +domain's P2M table as follows:
> > +
> > +For a PV domain, all writable pages assigned to the domain are identity
> > +mapped by MFN in the IOMMU page table. Thus a device driver running in the
> > +domain may program passthrough hardware for DMA using MFN values
> > +(i.e. host/machine frame numbers) looked up in its P2M.
> > +
> > +For an HVM domain, all non-foreign RAM pages present in its P2M will be
> > +mapped by GFN in the IOMMU page table. Thus a device driver running in the
> > +domain may program passthrough hardware using GFN values (i.e. guest
> > +physical frame numbers) without any further translation.
> > +
> > +This option is the default if the domain is PV and passthrough hardware
> > +is specified in the configuration.
> > +
> > +This option is not available on Arm.
> > +
> > +=item B<share_pt>
> > +
> > +This option is unavailable for a PV domain. For an HVM domain, this option
> > +means that the IOMMU will be programmed to directly reference the domain's
> > +P2M table as its page table. From the point of view of a device driver
> > +running in the domain this is functionally equivalent to B<sync_pt> but
> > +places less load on the hypervisor and so should generally be selected in
> > +preference. However, the availability of this option is hardware specific
> > +and thus, if it is specified for a domain running on hardware that does
> > +not allow it, B<sync_pt> will be used instead.
> > +
> > +This option is the default if the domain is HVM and passthrough hardware
> > +is specified in the configuration.
> 
> Perhaps worthwhile to clarify right away that on AMD we'll always
> fall back to sync_pt?

Ok.

> 
> Btw, I've no idea what the convention in xl.cfg is, but in the
> hypervisor I'd ask to use hyphens instead of underscores in the
> option value strings.

Enums in the idl use underscores and so using them in the xl.cfg parameters means I can take advantage of the autogenerated string-to-enum helper.

> 
> > --- a/xen/arch/arm/domain.c
> > +++ b/xen/arch/arm/domain.c
> > @@ -614,6 +614,14 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
> >           return -EINVAL;
> >       }
> >
> > +    /* Always share P2M Table between the CPU and the IOMMU */
> > +    if ( config->iommu_opts & XEN_DOMCTL_IOMMU_no_sharept )
> > +    {
> > +        dprintk(XENLOG_INFO,
> > +                "Unsupported iommu option: XEN_DOMCTL_IOMMU_no_sharept\n");
> > +        return -EINVAL;
> 
> Seeing our excessive use of EINVAL, I'm generally suggesting to
> use almost anything else where EINVAL isn't clearly the right
> one to use. EOPNOTSUPP would seem better here, for example, not
> the least because I assume it is at least a hypothetically
> valid setting (if it was implemented).

Julien... Is ARM ever likely to not use shared P2M/IOMMU mappings?

> 
> > @@ -176,6 +176,15 @@ int iommu_domain_init(struct domain *d)
> >       if ( ret )
> >           return ret;
> >
> > +    /*
> > +     * Use shared page tables for HAP and IOMMU if the global option
> > +     * is enabled (from which we can infer the h/w is capable) and
> > +     * the domain options do not disallow it. HAP must, of course, also
> > +     * be enabled.
> > +     */
> > +    hd->hap_pt_share = hap_enabled(d) && iommu_hap_pt_share &&
> > +        !(opts & XEN_DOMCTL_IOMMU_no_sharept);
> 
> The hap_enabled() again raises the question of whether this
> builds for Arm.

It should, as long as the pre-requisite series https://lists.xenproject.org/archives/html/xen-devel/2019-07/msg02253.html is applied. I'll double-check that though.

> 
> > --- a/xen/include/public/domctl.h
> > +++ b/xen/include/public/domctl.h
> > @@ -68,7 +68,11 @@ struct xen_domctl_createdomain {
> >   #define _XEN_DOMCTL_CDF_iommu         5
> >   #define XEN_DOMCTL_CDF_iommu          (1U<<_XEN_DOMCTL_CDF_iommu)
> >
> > -    uint32_t flags;
> > +    uint16_t flags;
> > +
> > +#define _XEN_DOMCTL_IOMMU_no_sharept  0
> > +#define XEN_DOMCTL_IOMMU_no_sharept   (1U<<_XEN_DOMCTL_IOMMU_no_sharept)
> > +    uint16_t iommu_opts;
> 
> Are we sure of this split into to 16-bit fields? With a
> growing number of settings to be established at domain
> creation I don't think the 11 remaining bits for the
> "general" flags won't last very long. IOW if such a split,
> then I think we'd better have two 32-bit fields.

Ok. With only 6 bits in use for the flags I thought there was enough headroom... I'll grow the structure instead.

> 
> > @@ -256,10 +256,18 @@ struct domain_iommu {
> >       /* Features supported by the IOMMU */
> >       DECLARE_BITMAP(features, IOMMU_FEAT_count);
> >
> > +    /*
> > +     * Does the guest share HAP mapping with the IOMMU? This is always
> > +     * true for ARM systems and may be true for x86 systems where the
> > +     * the hardware is capable.
> > +     */
> > +    bool hap_pt_share;
> > +
> >      /*
> >       * Does the guest reqire mappings to be synchonized, to maintain
> > -     * the default dfn == pfn map. (See comment on dfn at the top of
> > -     * include/xen/mm.h).
> > +     * the default dfn == pfn map? (See comment on dfn at the top of
> > +     * include/xen/mm.h). Note that hap_pt_share == false does not
> > +     * necessarily imply this is true.
> 
> Would you mind also correcting the two spelling mistakes in the
> first line of the comment that you don't touch right now?

Happy to do that,

  Paul

> 
> Jan
diff mbox series

Patch

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index c99d40307e..c669524bec 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -605,6 +605,58 @@  option should only be used with a trusted device tree.
 Note that the partial device tree should avoid using the phandle 65000
 which is reserved by the toolstack.
 
+=item B<passthrough="STRING">
+
+Specify whether IOMMU mappings are enabled for the domain and hence whether
+it will be enabled for passthrough hardware. Valid values for this option
+are:
+
+=over 4
+
+=item B<disabled>
+
+IOMMU mappings are disabled for the domain and so hardware may not be
+passed through.
+
+This option is the default if no passthrough hardware is specified
+in the domain's configuration.
+
+=item B<sync_pt>
+
+This option means that IOMMU mappings will be synchronized with the
+domain's P2M table as follows:
+
+For a PV domain, all writable pages assigned to the domain are identity
+mapped by MFN in the IOMMU page table. Thus a device driver running in the
+domain may program passthrough hardware for DMA using MFN values
+(i.e. host/machine frame numbers) looked up in its P2M.
+
+For an HVM domain, all non-foreign RAM pages present in its P2M will be
+mapped by GFN in the IOMMU page table. Thus a device driver running in the
+domain may program passthrough hardware using GFN values (i.e. guest
+physical frame numbers) without any further translation.
+
+This option is the default if the domain is PV and passthrough hardware
+is specified in the configuration.
+
+This option is not available on Arm.
+
+=item B<share_pt>
+
+This option is unavailable for a PV domain. For an HVM domain, this option
+means that the IOMMU will be programmed to directly reference the domain's
+P2M table as its page table. From the point of view of a device driver
+running in the domain this is functionally equivalent to B<sync_pt> but
+places less load on the hypervisor and so should generally be selected in
+preference. However, the availability of this option is hardware specific
+and thus, if it is specified for a domain running on hardware that does
+not allow it, B<sync_pt> will be used instead.
+
+This option is the default if the domain is HVM and passthrough hardware
+is specified in the configuration.
+
+=back
+
 =back
 
 =head2 Devices
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 9bacfb97f0..5de7c07a41 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -394,6 +394,11 @@ 
  */
 #define LIBXL_HAVE_EXTENDED_VKB 1
 
+/*
+ * libxl_domain_create_info has libxl_passthrough enumeration.
+ */
+#define LIBXL_HAVE_CREATEINFO_PASSTHROUGH 1
+
 /*
  * libxl ABI compatibility
  *
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index feb9f1ce0c..ad5f36484a 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -565,12 +565,26 @@  int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
                 libxl_defbool_val(info->oos) ? 0 : XEN_DOMCTL_CDF_oos_off;
         }
 
+        LOG(DETAIL, "passthrough: %s",
+            libxl_passthrough_to_string(info->passthrough));
+
         rc = libxl_get_physinfo(ctx, &physinfo);
         if (rc < 0)
             goto out;
 
-        if ( physinfo.cap_hvm_directio )
+        if (info->passthrough != LIBXL_PASSTHROUGH_DISABLED)
+        {
+            if (!physinfo.cap_hvm_directio) {
+                LOGED(ERROR, *domid, "passthrough not available");
+                rc = ERROR_FAIL;
+                goto out;
+            }
+
             create.flags |= XEN_DOMCTL_CDF_iommu;
+        }
+
+        if (info->passthrough == LIBXL_PASSTHROUGH_SYNC_PT)
+            create.iommu_opts |= XEN_DOMCTL_IOMMU_no_sharept;
 
         /* Ultimately, handle is an array of 16 uint8_t, same as uuid */
         libxl_uuid_copy(ctx, (libxl_uuid *)&create.handle, &info->uuid);
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index b61399ce36..7e37de8646 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -263,6 +263,12 @@  libxl_vkb_backend = Enumeration("vkb_backend", [
     (2, "LINUX")
     ])
 
+libxl_passthrough = Enumeration("passthrough", [
+    (0, "disabled"),
+    (1, "sync_pt"),
+    (2, "share_pt"),
+    ])
+
 #
 # Complex libxl types
 #
@@ -408,6 +414,7 @@  libxl_domain_create_info = Struct("domain_create_info",[
     ("pool_name",    string),
     ("run_hotplug_scripts",libxl_defbool),
     ("driver_domain",libxl_defbool),
+    ("passthrough",  libxl_passthrough),
     ], dir=DIR_IN)
 
 libxl_domain_restore_params = Struct("domain_restore_params", [
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index e105bda2bb..c904604008 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -2326,6 +2326,44 @@  skip_vfb:
         }
     }
 
+    if (!xlu_cfg_get_string(config, "passthrough", &buf, 0)) {
+        libxl_passthrough o;
+
+        e = libxl_passthrough_from_string(buf, &o);
+        if (e) {
+            fprintf(stderr,
+                    "ERROR: unknown passthrough option '%s'\n",
+                    buf);
+            exit(-ERROR_FAIL);
+        }
+
+        switch (o) {
+        case LIBXL_PASSTHROUGH_DISABLED:
+            if (d_config->num_pcidevs || d_config->num_dtdevs) {
+                fprintf(stderr,
+                        "ERROR: passthrough disabled but devices are specified\n");
+                exit(-ERROR_FAIL);
+            }
+        case LIBXL_PASSTHROUGH_SHARE_PT:
+            if (c_info->type == LIBXL_DOMAIN_TYPE_PV) {
+                fprintf(stderr,
+                        "ERROR: passthrough=\"share_pt\" not valid for PV domain\n");
+                exit(-ERROR_FAIL);
+            }
+        default:
+            break;
+        }
+
+        c_info->passthrough = o;
+    } else if (d_config->num_pcidevs || d_config->num_dtdevs) {
+        /*
+         * Passthrough devices are specified so set an appropriate
+         * default value.
+         */
+        c_info->passthrough = (c_info->type == LIBXL_DOMAIN_TYPE_PV) ?
+            LIBXL_PASSTHROUGH_SYNC_PT : LIBXL_PASSTHROUGH_SHARE_PT;
+    }
+
     if (!xlu_cfg_get_list(config, "usbctrl", &usbctrls, 0, 0)) {
         d_config->num_usbctrls = 0;
         d_config->usbctrls = NULL;
diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index e06bd27dad..233dc4d59d 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -614,6 +614,14 @@  int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
         return -EINVAL;
     }
 
+    /* Always share P2M Table between the CPU and the IOMMU */
+    if ( config->iommu_opts & XEN_DOMCTL_IOMMU_no_sharept )
+    {
+        dprintk(XENLOG_INFO,
+                "Unsupported iommu option: XEN_DOMCTL_IOMMU_no_sharept\n");
+        return -EINVAL;
+    }
+
     /* Fill in the native GIC version, passed back to the toolstack. */
     if ( config->arch.gic_version == XEN_DOMCTL_CONFIG_GIC_NATIVE )
     {
@@ -673,7 +681,8 @@  int arch_domain_create(struct domain *d,
 
     ASSERT(config != NULL);
 
-    if ( is_iommu_enabled(d) && (rc = iommu_domain_init(d)) != 0 )
+    if ( is_iommu_enabled(d) &&
+         (rc = iommu_domain_init(d, config->iommu_opts)) != 0 )
         goto fail;
 
     if ( (rc = p2m_init(d)) != 0 )
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 42778099da..4bbd8de663 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -604,7 +604,8 @@  int arch_domain_create(struct domain *d,
     if ( (rc = init_domain_irq_mapping(d)) != 0 )
         goto fail;
 
-    if ( is_iommu_enabled(d) && (rc = iommu_domain_init(d)) != 0 )
+    if ( is_iommu_enabled(d) &&
+         (rc = iommu_domain_init(d, config->iommu_opts)) != 0 )
         goto fail;
 
     psr_domain_init(d);
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 0df4b47352..b0b9a17a18 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -308,6 +308,13 @@  static int sanitise_domain_config(struct xen_domctl_createdomain *config)
         return -EINVAL;
     }
 
+    if ( !(config->flags & XEN_DOMCTL_CDF_iommu) && config->iommu_opts )
+    {
+        dprintk(XENLOG_INFO,
+                "IOMMU options specified but IOMMU not enabled\n");
+        return -EINVAL;
+    }
+
     if ( config->max_vcpus < 1 )
     {
         dprintk(XENLOG_INFO, "No vCPUS\n");
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 67855eeed5..1c556e39ab 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -159,7 +159,7 @@  static void __hwdom_init check_hwdom_reqs(struct domain *d)
     iommu_hwdom_strict = true;
 }
 
-int iommu_domain_init(struct domain *d)
+int iommu_domain_init(struct domain *d, unsigned int opts)
 {
     struct domain_iommu *hd = dom_iommu(d);
     int ret = 0;
@@ -176,6 +176,15 @@  int iommu_domain_init(struct domain *d)
     if ( ret )
         return ret;
 
+    /*
+     * Use shared page tables for HAP and IOMMU if the global option
+     * is enabled (from which we can infer the h/w is capable) and
+     * the domain options do not disallow it. HAP must, of course, also
+     * be enabled.
+     */
+    hd->hap_pt_share = hap_enabled(d) && iommu_hap_pt_share &&
+        !(opts & XEN_DOMCTL_IOMMU_no_sharept);
+
     /*
      * NB: 'relaxed' h/w domains don't need the IOMMU mappings to be kept
      *     in-sync with their assigned pages because all host RAM will be
@@ -187,6 +196,8 @@  int iommu_domain_init(struct domain *d)
     if ( !is_hardware_domain(d) || iommu_hwdom_strict )
         hd->need_sync = !iommu_use_hap_pt(d);
 
+    ASSERT(!(hd->need_sync && hd->hap_pt_share));
+
     return 0;
 }
 
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 3f82c78870..ebab653cac 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -68,7 +68,11 @@  struct xen_domctl_createdomain {
 #define _XEN_DOMCTL_CDF_iommu         5
 #define XEN_DOMCTL_CDF_iommu          (1U<<_XEN_DOMCTL_CDF_iommu)
 
-    uint32_t flags;
+    uint16_t flags;
+
+#define _XEN_DOMCTL_IOMMU_no_sharept  0
+#define XEN_DOMCTL_IOMMU_no_sharept   (1U<<_XEN_DOMCTL_IOMMU_no_sharept)
+    uint16_t iommu_opts;
 
     /*
      * Various domain limits, which impact the quantity of resources (global
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 45ec6cfe44..40e59a4fbf 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -67,7 +67,7 @@  extern unsigned int iommu_dev_iotlb_timeout;
 int iommu_setup(void);
 int iommu_hardware_setup(void);
 
-int iommu_domain_init(struct domain *d);
+int iommu_domain_init(struct domain *d, unsigned int opts);
 void iommu_hwdom_init(struct domain *d);
 void iommu_domain_destroy(struct domain *d);
 
@@ -256,10 +256,18 @@  struct domain_iommu {
     /* Features supported by the IOMMU */
     DECLARE_BITMAP(features, IOMMU_FEAT_count);
 
+    /*
+     * Does the guest share HAP mapping with the IOMMU? This is always
+     * true for ARM systems and may be true for x86 systems where the
+     * the hardware is capable.
+     */
+    bool hap_pt_share;
+
     /*
      * Does the guest reqire mappings to be synchonized, to maintain
-     * the default dfn == pfn map. (See comment on dfn at the top of
-     * include/xen/mm.h).
+     * the default dfn == pfn map? (See comment on dfn at the top of
+     * include/xen/mm.h). Note that hap_pt_share == false does not
+     * necessarily imply this is true.
      */
     bool need_sync;
 };
@@ -269,8 +277,7 @@  struct domain_iommu {
 #define iommu_clear_feature(d, f) clear_bit(f, dom_iommu(d)->features)
 
 /* Are we using the domain P2M table as its IOMMU pagetable? */
-#define iommu_use_hap_pt(d) \
-    (hap_enabled(d) && is_iommu_enabled(d) && iommu_hap_pt_share)
+#define iommu_use_hap_pt(d)       (dom_iommu(d)->hap_pt_share)
 
 /* Does the IOMMU pagetable need to be kept synchronized with the P2M */
 #define need_iommu_pt_sync(d)     (dom_iommu(d)->need_sync)