Message ID | 20190916092708.2624-1-paul.durrant@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | None | expand |
On 9/16/19 10:27 AM, Paul Durrant wrote: > ...and hence the ability to disable IOMMU mappings, and control EPT > sharing. > > This patch introduces a new 'libxl_passthrough' enumeration into > libxl_domain_create_info. The value will be set by xl either when it parses > a new 'passthrough' option in xl.cfg, or implicitly if there is passthrough > hardware specified for the domain. > > If the value of the passthrough configuration option is 'disabled' then > the XEN_DOMCTL_CDF_iommu flag will be clear in the xen_domctl_createdomain > flags, thus allowing the toolstack to control whether the domain gets > IOMMU mappings or not (where previously they were globally set). > > If the value of the passthrough configuration option is 'sync_pt' then > a new 'iommu_opts' field in xen_domctl_createdomain will be set with the > value XEN_DOMCTL_IOMMU_no_sharept. This will override the global default > set in iommu_hap_pt_share, thus allowing the toolstack to control whether > EPT sharing is used for the domain. > > If the value of passthrough is 'enabled' then xl will choose an appropriate > default according to the type of domain and hardware support. Minor suggestion: I prefer using a word like "auto" when you're letting the computer decide something. I'd also... > diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in > index c99d40307e..154d847fb9 100644 > --- a/docs/man/xl.cfg.5.pod.in > +++ b/docs/man/xl.cfg.5.pod.in > @@ -605,6 +605,62 @@ option should only be used with a trusted device tree. > Note that the partial device tree should avoid using the phandle 65000 > which is reserved by the toolstack. > > +=item B<passthrough="STRING"> > + > +Specify whether IOMMU mappings are enabled for the domain and hence whether > +it will be enabled for passthrough hardware. Valid values for this option > +are: > + > +=over 4 > + > +=item B<disabled> > + > +IOMMU mappings are disabled for the domain and so hardware may not be > +passed through. > + > +This option is the default if no passthrough hardware is specified in the > +domain's configuration. > + > +=item B<sync_pt> > + > +This option means that IOMMU mappings will be synchronized with the > +domain's P2M table as follows: > + > +For a PV domain, all writable pages assigned to the domain are identity > +mapped by MFN in the IOMMU page table. Thus a device driver running in the > +domain may program passthrough hardware for DMA using MFN values > +(i.e. host/machine frame numbers) looked up in its P2M. > + > +For an HVM domain, all non-foreign RAM pages present in its P2M will be > +mapped by GFN in the IOMMU page table. Thus a device driver running in the > +domain may program passthrough hardware using GFN values (i.e. guest > +physical frame numbers) without any further translation. > + > +This option is not currently available on Arm. > + > +=item B<share_pt> > + > +This option is unavailable for a PV domain. For an HVM domain, this option > +means that the IOMMU will be programmed to directly reference the domain's > +P2M table as its page table. From the point of view of a device driver > +running in the domain this is functionally equivalent to B<sync_pt> but > +places less load on the hypervisor and so should generally be selected in > +preference. However, the availability of this option is hardware specific. > +If B<xl info> reports B<virt_caps> containing B<iommu_hap_pt_share> then > +this option may be used. > + > +=item B<enabled> > + > +This option enables IOMMU mappings and selects an appropriate default > +operating mode. For HVM domains running on platforms where the option is > +available, this is equivalent to B<share_pt>. Otherwise, and also for PV > +domains, this options is equivalent to B<sync_pt>. ...put the option we want / expect people nearer the top (either first or second). Thanks, -George
On Mon, Sep 16, 2019 at 10:27:08AM +0100, Paul Durrant wrote: > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c > index 59dbcb50a0..7afae81432 100644 > --- a/tools/libxl/libxl_create.c > +++ b/tools/libxl/libxl_create.c > @@ -30,6 +30,12 @@ > int libxl__domain_create_info_setdefault(libxl__gc *gc, > libxl_domain_create_info *c_info) > { > + libxl_physinfo info; > + int rc = libxl_get_physinfo(CTX, &info); > + > + if (rc) > + return rc; > + I think this hunk would be more readable if it was written: int rc; rc = libxl_get_physinfo(CTX, &info); if (rc) return rc; Otherwise, the check for error is alone which might mean "we are in a callback function and check for rc passed by parameter" or that it's a stray check. > @@ -62,6 +62,12 @@ int libxl__domain_create_info_setdefault(libxl__gc *gc, > if (!c_info->ssidref) > c_info->ssidref = SECINITSID_DOMU; > > + if (info.cap_hvm_directio) { > + c_info->passthrough = ((c_info->type == LIBXL_DOMAIN_TYPE_PV) || > + !info.cap_iommu_hap_pt_share) ? > + LIBXL_PASSTHROUGH_SYNC_PT : LIBXL_PASSTHROUGH_SHARE_PT; > + } I don't think you can do that. This will overwrite the value that was in c_info before (that a user as set). The _setdefault() function is only called after c_info have been filled by users of the libxl, it only has to change the value if it was the default. c_info->passthrough has no default value, so it's not possible to know what a user wants. What about adding "default"==0 in libxl_passthrough enum? > return 0; > } > > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl > index d52c63b6b0..22f05711e3 100644 > --- a/tools/libxl/libxl_types.idl > +++ b/tools/libxl/libxl_types.idl > @@ -263,6 +263,12 @@ libxl_vkb_backend = Enumeration("vkb_backend", [ > (2, "LINUX") > ]) > > +libxl_passthrough = Enumeration("passthrough", [ > + (0, "disabled"), > + (1, "sync_pt"), > + (2, "share_pt"), > + ]) > + > # > # Complex libxl types > # > diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c > index 293f5f730e..4b2baa0403 100644 > --- a/tools/xl/xl_parse.c > +++ b/tools/xl/xl_parse.c > @@ -1461,6 +1461,107 @@ void parse_config_data(const char *config_source, > + if (xlu_cfg_get_string(config, "passthrough", &buf, 0)) { > + buf = (d_config->num_pcidevs || d_config->num_dtdevs) > + ? "enabled" : "disabled"; > + } > + > + if (!strncmp(buf, "enabled", strlen(buf))) { Do you intend to have "passthrough=''" been the equivalent of "passthrough='enabled'" in the config file ? (same with "e", "en", "ena", ...) > + /* Choose a suitable default */ > + c_info->passthrough = > + (c_info->type == LIBXL_DOMAIN_TYPE_PV) || !iommu_hap_pt_share > + ? LIBXL_PASSTHROUGH_SYNC_PT : LIBXL_PASSTHROUGH_SHARE_PT; Thanks,
> -----Original Message----- > From: Anthony PERARD <anthony.perard@citrix.com> > Sent: 16 September 2019 15:06 > To: Paul Durrant <Paul.Durrant@citrix.com> > Cc: xen-devel@lists.xenproject.org; Jan Beulich <jbeulich@suse.com>; Christian Lindig > <christian.lindig@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Wei Liu <wl@xen.org>; Andrew > Cooper <Andrew.Cooper3@citrix.com>; George Dunlap <George.Dunlap@citrix.com>; Julien Grall > <julien.grall@arm.com>; Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; Stefano Stabellini > <sstabellini@kernel.org>; Tim (Xen.org) <tim@xen.org>; David Scott <dave@recoil.org>; Volodymyr > Babchuk <Volodymyr_Babchuk@epam.com>; Roger Pau Monne <roger.pau@citrix.com> > Subject: Re: [PATCH v12 6/6] introduce a 'passthrough' configuration option to xl.cfg... > > On Mon, Sep 16, 2019 at 10:27:08AM +0100, Paul Durrant wrote: > > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c > > index 59dbcb50a0..7afae81432 100644 > > --- a/tools/libxl/libxl_create.c > > +++ b/tools/libxl/libxl_create.c > > @@ -30,6 +30,12 @@ > > int libxl__domain_create_info_setdefault(libxl__gc *gc, > > libxl_domain_create_info *c_info) > > { > > + libxl_physinfo info; > > + int rc = libxl_get_physinfo(CTX, &info); > > + > > + if (rc) > > + return rc; > > + > > I think this hunk would be more readable if it was written: > int rc; > > rc = libxl_get_physinfo(CTX, &info); > if (rc) > return rc; > > Otherwise, the check for error is alone which might mean "we are in a > callback function and check for rc passed by parameter" or that it's a > stray check. > > > @@ -62,6 +62,12 @@ int libxl__domain_create_info_setdefault(libxl__gc *gc, > > if (!c_info->ssidref) > > c_info->ssidref = SECINITSID_DOMU; > > > > + if (info.cap_hvm_directio) { > > + c_info->passthrough = ((c_info->type == LIBXL_DOMAIN_TYPE_PV) || > > + !info.cap_iommu_hap_pt_share) ? > > + LIBXL_PASSTHROUGH_SYNC_PT : LIBXL_PASSTHROUGH_SHARE_PT; > > + } > > I don't think you can do that. This will overwrite the value that was in > c_info before (that a user as set). The _setdefault() function is only > called after c_info have been filled by users of the libxl, it only has > to change the value if it was the default. Oh, what a useful semantic. > c_info->passthrough has no > default value, so it's not possible to know what a user wants. > > What about adding "default"==0 in libxl_passthrough enum? I guess that's probably the only option, but there is still the problem of the page table memory overhead. I guess I'd have to assume the worst case for 'default'. That separation between create info and build info is really unhelpful. > > > return 0; > > } > > > > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl > > index d52c63b6b0..22f05711e3 100644 > > --- a/tools/libxl/libxl_types.idl > > +++ b/tools/libxl/libxl_types.idl > > @@ -263,6 +263,12 @@ libxl_vkb_backend = Enumeration("vkb_backend", [ > > (2, "LINUX") > > ]) > > > > +libxl_passthrough = Enumeration("passthrough", [ > > + (0, "disabled"), > > + (1, "sync_pt"), > > + (2, "share_pt"), > > + ]) > > + > > # > > # Complex libxl types > > # > > diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c > > index 293f5f730e..4b2baa0403 100644 > > --- a/tools/xl/xl_parse.c > > +++ b/tools/xl/xl_parse.c > > @@ -1461,6 +1461,107 @@ void parse_config_data(const char *config_source, > > + if (xlu_cfg_get_string(config, "passthrough", &buf, 0)) { > > + buf = (d_config->num_pcidevs || d_config->num_dtdevs) > > + ? "enabled" : "disabled"; > > + } > > + > > + if (!strncmp(buf, "enabled", strlen(buf))) { > > Do you intend to have "passthrough=''" been the equivalent of > "passthrough='enabled'" in the config file ? > (same with "e", "en", "ena", ...) No, I guess not; I'll fix that. Paul
Hi Paul, On 9/16/19 10:27 AM, Paul Durrant wrote: > ...and hence the ability to disable IOMMU mappings, and control EPT > sharing. > > This patch introduces a new 'libxl_passthrough' enumeration into > libxl_domain_create_info. The value will be set by xl either when it parses > a new 'passthrough' option in xl.cfg, or implicitly if there is passthrough > hardware specified for the domain. > > If the value of the passthrough configuration option is 'disabled' then > the XEN_DOMCTL_CDF_iommu flag will be clear in the xen_domctl_createdomain > flags, thus allowing the toolstack to control whether the domain gets > IOMMU mappings or not (where previously they were globally set). > > If the value of the passthrough configuration option is 'sync_pt' then > a new 'iommu_opts' field in xen_domctl_createdomain will be set with the > value XEN_DOMCTL_IOMMU_no_sharept. This will override the global default > set in iommu_hap_pt_share, thus allowing the toolstack to control whether > EPT sharing is used for the domain. > > If the value of passthrough is 'enabled' then xl will choose an appropriate > default according to the type of domain and hardware support. > > NOTE: The 'iommu_memkb' overhead in libxl_domain_build_info will now only > be set if passthrough is 'sync_pt' (or xl has chosen this mode as > a default). > > Signed-off-by: Paul Durrant <paul.durrant@citrix.com> > Reviewed-by: Jan Beulich <jbeulich@suse.com> > Acked-by: Christian Lindig <christian.lindig@citrix.com> > --- > Cc: Ian Jackson <ian.jackson@eu.citrix.com> > Cc: Wei Liu <wl@xen.org> > Cc: Andrew Cooper <andrew.cooper3@citrix.com> > Cc: George Dunlap <George.Dunlap@eu.citrix.com> > Cc: Julien Grall <julien.grall@arm.com> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > Cc: Stefano Stabellini <sstabellini@kernel.org> > Cc: Tim Deegan <tim@xen.org> > Cc: Anthony PERARD <anthony.perard@citrix.com> > Cc: David Scott <dave@recoil.org> > Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> > Cc: "Roger Pau Monné" <roger.pau@citrix.com> > > Sending just this patch since the rest of the series is unchanged from > v11(.1). > > Previously part of series https://lists.xenproject.org/archives/html/xen-devel/2019-07/msg02267.html > > v12: > - Set passthrough default in libxl > > v11: > - Fixed abi-check runes > > v10: > - Added abi-check runes > > v9: > - Added the passthrough='enabled' option to xl > - One cosmetic change in xen > - Assume Jan's R-b stands since non-cosmetic changes are only in the > toolstack > > v7: > - Added missing breaks > - Added missing ocaml binding changes > > v6: > - Remove the libxl_physinfo() call since it's usefulness is limited to x86 > > v5: > - Expand xen_domctl_createdomain flags field and hence bump interface > version > - Fix spelling mistakes in context line > --- > docs/man/xl.cfg.5.pod.in | 56 ++++++++++ > tools/libxl/libxl.h | 9 ++ > tools/libxl/libxl_create.c | 27 +++-- > tools/libxl/libxl_types.idl | 7 ++ > tools/ocaml/libs/xc/xenctrl.ml | 4 + > tools/ocaml/libs/xc/xenctrl.mli | 5 + > tools/ocaml/libs/xc/xenctrl_stubs.c | 17 ++- > tools/xl/xl_parse.c | 158 ++++++++++++++++++---------- > xen/arch/arm/domain.c | 10 +- > xen/arch/x86/domain.c | 2 +- > xen/common/domain.c | 7 ++ > xen/common/domctl.c | 13 --- > xen/drivers/passthrough/iommu.c | 13 ++- > xen/include/public/domctl.h | 10 +- > xen/include/xen/iommu.h | 15 +-- > 15 files changed, 266 insertions(+), 87 deletions(-) > > diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in > index c99d40307e..154d847fb9 100644 > --- a/docs/man/xl.cfg.5.pod.in > +++ b/docs/man/xl.cfg.5.pod.in > @@ -605,6 +605,62 @@ option should only be used with a trusted device tree. > Note that the partial device tree should avoid using the phandle 65000 > which is reserved by the toolstack. > > +=item B<passthrough="STRING"> > + > +Specify whether IOMMU mappings are enabled for the domain and hence whether > +it will be enabled for passthrough hardware. Valid values for this option > +are: > + > +=over 4 > + > +=item B<disabled> > + > +IOMMU mappings are disabled for the domain and so hardware may not be > +passed through. > + > +This option is the default if no passthrough hardware is specified in the > +domain's configuration. > + > +=item B<sync_pt> > + > +This option means that IOMMU mappings will be synchronized with the > +domain's P2M table as follows: > + > +For a PV domain, all writable pages assigned to the domain are identity > +mapped by MFN in the IOMMU page table. Thus a device driver running in the > +domain may program passthrough hardware for DMA using MFN values > +(i.e. host/machine frame numbers) looked up in its P2M. > + > +For an HVM domain, all non-foreign RAM pages present in its P2M will be Apologies to make the comment this late (it probably can be addressed in a follow-up if the patch is good to go). From the user point of view, they have the choice between HVM and PVH. The fact that PVH are based on HVM domain is a Xen internal (actually we were using PV for Àrm in the past). So we may want to say HVM/PVH here and other instance below. Other than that the Arm changes looks good to me: Acked-by: Julien Grall <julien.grall@arm.com> Cheers,
> -----Original Message----- > From: George Dunlap <george.dunlap@citrix.com> > Sent: 16 September 2019 11:15 > To: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org > Cc: Jan Beulich <jbeulich@suse.com>; Christian Lindig <christian.lindig@citrix.com>; Ian Jackson > <Ian.Jackson@citrix.com>; Wei Liu <wl@xen.org>; Andrew Cooper <Andrew.Cooper3@citrix.com>; George > Dunlap <George.Dunlap@citrix.com>; Julien Grall <julien.grall@arm.com>; Konrad Rzeszutek Wilk > <konrad.wilk@oracle.com>; Stefano Stabellini <sstabellini@kernel.org>; Tim (Xen.org) <tim@xen.org>; > Anthony Perard <anthony.perard@citrix.com>; David Scott <dave@recoil.org>; Volodymyr Babchuk > <Volodymyr_Babchuk@epam.com>; Roger Pau Monne <roger.pau@citrix.com> > Subject: Re: [PATCH v12 6/6] introduce a 'passthrough' configuration option to xl.cfg... > > On 9/16/19 10:27 AM, Paul Durrant wrote: > > ...and hence the ability to disable IOMMU mappings, and control EPT > > sharing. > > > > This patch introduces a new 'libxl_passthrough' enumeration into > > libxl_domain_create_info. The value will be set by xl either when it parses > > a new 'passthrough' option in xl.cfg, or implicitly if there is passthrough > > hardware specified for the domain. > > > > If the value of the passthrough configuration option is 'disabled' then > > the XEN_DOMCTL_CDF_iommu flag will be clear in the xen_domctl_createdomain > > flags, thus allowing the toolstack to control whether the domain gets > > IOMMU mappings or not (where previously they were globally set). > > > > If the value of the passthrough configuration option is 'sync_pt' then > > a new 'iommu_opts' field in xen_domctl_createdomain will be set with the > > value XEN_DOMCTL_IOMMU_no_sharept. This will override the global default > > set in iommu_hap_pt_share, thus allowing the toolstack to control whether > > EPT sharing is used for the domain. > > > > If the value of passthrough is 'enabled' then xl will choose an appropriate > > default according to the type of domain and hardware support. > > Minor suggestion: I prefer using a word like "auto" when you're letting > the computer decide something. Not sure I like that. I'll wait for a second opinion on that one. > > I'd also... > > > diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in > > index c99d40307e..154d847fb9 100644 > > --- a/docs/man/xl.cfg.5.pod.in > > +++ b/docs/man/xl.cfg.5.pod.in > > @@ -605,6 +605,62 @@ option should only be used with a trusted device tree. > > Note that the partial device tree should avoid using the phandle 65000 > > which is reserved by the toolstack. > > > > +=item B<passthrough="STRING"> > > + > > +Specify whether IOMMU mappings are enabled for the domain and hence whether > > +it will be enabled for passthrough hardware. Valid values for this option > > +are: > > + > > +=over 4 > > + > > +=item B<disabled> > > + > > +IOMMU mappings are disabled for the domain and so hardware may not be > > +passed through. > > + > > +This option is the default if no passthrough hardware is specified in the > > +domain's configuration. > > + > > +=item B<sync_pt> > > + > > +This option means that IOMMU mappings will be synchronized with the > > +domain's P2M table as follows: > > + > > +For a PV domain, all writable pages assigned to the domain are identity > > +mapped by MFN in the IOMMU page table. Thus a device driver running in the > > +domain may program passthrough hardware for DMA using MFN values > > +(i.e. host/machine frame numbers) looked up in its P2M. > > + > > +For an HVM domain, all non-foreign RAM pages present in its P2M will be > > +mapped by GFN in the IOMMU page table. Thus a device driver running in the > > +domain may program passthrough hardware using GFN values (i.e. guest > > +physical frame numbers) without any further translation. > > + > > +This option is not currently available on Arm. > > + > > +=item B<share_pt> > > + > > +This option is unavailable for a PV domain. For an HVM domain, this option > > +means that the IOMMU will be programmed to directly reference the domain's > > +P2M table as its page table. From the point of view of a device driver > > +running in the domain this is functionally equivalent to B<sync_pt> but > > +places less load on the hypervisor and so should generally be selected in > > +preference. However, the availability of this option is hardware specific. > > +If B<xl info> reports B<virt_caps> containing B<iommu_hap_pt_share> then > > +this option may be used. > > + > > +=item B<enabled> > > + > > +This option enables IOMMU mappings and selects an appropriate default > > +operating mode. For HVM domains running on platforms where the option is > > +available, this is equivalent to B<share_pt>. Otherwise, and also for PV > > +domains, this options is equivalent to B<sync_pt>. > > ...put the option we want / expect people nearer the top (either first > or second). > Ok, I'll do that. Paul > Thanks, > -George
diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in index c99d40307e..154d847fb9 100644 --- a/docs/man/xl.cfg.5.pod.in +++ b/docs/man/xl.cfg.5.pod.in @@ -605,6 +605,62 @@ option should only be used with a trusted device tree. Note that the partial device tree should avoid using the phandle 65000 which is reserved by the toolstack. +=item B<passthrough="STRING"> + +Specify whether IOMMU mappings are enabled for the domain and hence whether +it will be enabled for passthrough hardware. Valid values for this option +are: + +=over 4 + +=item B<disabled> + +IOMMU mappings are disabled for the domain and so hardware may not be +passed through. + +This option is the default if no passthrough hardware is specified in the +domain's configuration. + +=item B<sync_pt> + +This option means that IOMMU mappings will be synchronized with the +domain's P2M table as follows: + +For a PV domain, all writable pages assigned to the domain are identity +mapped by MFN in the IOMMU page table. Thus a device driver running in the +domain may program passthrough hardware for DMA using MFN values +(i.e. host/machine frame numbers) looked up in its P2M. + +For an HVM domain, all non-foreign RAM pages present in its P2M will be +mapped by GFN in the IOMMU page table. Thus a device driver running in the +domain may program passthrough hardware using GFN values (i.e. guest +physical frame numbers) without any further translation. + +This option is not currently available on Arm. + +=item B<share_pt> + +This option is unavailable for a PV domain. For an HVM domain, this option +means that the IOMMU will be programmed to directly reference the domain's +P2M table as its page table. From the point of view of a device driver +running in the domain this is functionally equivalent to B<sync_pt> but +places less load on the hypervisor and so should generally be selected in +preference. However, the availability of this option is hardware specific. +If B<xl info> reports B<virt_caps> containing B<iommu_hap_pt_share> then +this option may be used. + +=item B<enabled> + +This option enables IOMMU mappings and selects an appropriate default +operating mode. For HVM domains running on platforms where the option is +available, this is equivalent to B<share_pt>. Otherwise, and also for PV +domains, this options is equivalent to B<sync_pt>. + +This option is the default if passthrough hardware is specified in the +domain's configuration. + +=back + =back =head2 Devices diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index 12545130df..e4b9c539b6 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -415,6 +415,15 @@ */ #define LIBXL_HAVE_BUILDINFO_IOMMU_MEMKB 1 +/* + * LIBXL_HAVE_CREATEINFO_PASSTHROUGH indicates that + * libxl_domain_create_info has a passthrough field (which is a + * libxl_passthrough enumeration) that indicates whether device pass- + * through is enabled for the domain and, if so, whether the IOMMU and + * HAP page tables may be shared or not. + */ +#define LIBXL_HAVE_CREATEINFO_PASSTHROUGH 1 + /* * libxl ABI compatibility * diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index 59dbcb50a0..7afae81432 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -30,6 +30,12 @@ int libxl__domain_create_info_setdefault(libxl__gc *gc, libxl_domain_create_info *c_info) { + libxl_physinfo info; + int rc = libxl_get_physinfo(CTX, &info); + + if (rc) + return rc; + if (!c_info->type) { LOG(ERROR, "domain type unspecified"); return ERROR_INVAL; @@ -38,12 +44,6 @@ int libxl__domain_create_info_setdefault(libxl__gc *gc, libxl__arch_domain_create_info_setdefault(gc, c_info); if (c_info->type != LIBXL_DOMAIN_TYPE_PV) { - libxl_physinfo info; - int rc = libxl_get_physinfo(CTX, &info); - - if (rc) - return rc; - if (info.cap_hap) libxl_defbool_setdefault(&c_info->hap, true); else if (info.cap_shadow) @@ -62,6 +62,12 @@ int libxl__domain_create_info_setdefault(libxl__gc *gc, if (!c_info->ssidref) c_info->ssidref = SECINITSID_DOMU; + if (info.cap_hvm_directio) { + c_info->passthrough = ((c_info->type == LIBXL_DOMAIN_TYPE_PV) || + !info.cap_iommu_hap_pt_share) ? + LIBXL_PASSTHROUGH_SYNC_PT : LIBXL_PASSTHROUGH_SHARE_PT; + } + return 0; } @@ -578,6 +584,15 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config, libxl_defbool_val(info->oos) ? 0 : XEN_DOMCTL_CDF_oos_off; } + LOG(DETAIL, "passthrough: %s", + libxl_passthrough_to_string(info->passthrough)); + + if (info->passthrough != LIBXL_PASSTHROUGH_DISABLED) + create.flags |= XEN_DOMCTL_CDF_iommu; + + if (info->passthrough == LIBXL_PASSTHROUGH_SYNC_PT) + create.iommu_opts |= XEN_DOMCTL_IOMMU_no_sharept; + /* Ultimately, handle is an array of 16 uint8_t, same as uuid */ libxl_uuid_copy(ctx, (libxl_uuid *)&create.handle, &info->uuid); diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index d52c63b6b0..22f05711e3 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -263,6 +263,12 @@ libxl_vkb_backend = Enumeration("vkb_backend", [ (2, "LINUX") ]) +libxl_passthrough = Enumeration("passthrough", [ + (0, "disabled"), + (1, "sync_pt"), + (2, "share_pt"), + ]) + # # Complex libxl types # @@ -408,6 +414,7 @@ libxl_domain_create_info = Struct("domain_create_info",[ ("pool_name", string), ("run_hotplug_scripts",libxl_defbool), ("driver_domain",libxl_defbool), + ("passthrough", libxl_passthrough), ], dir=DIR_IN) libxl_domain_restore_params = Struct("domain_restore_params", [ diff --git a/tools/ocaml/libs/xc/xenctrl.ml b/tools/ocaml/libs/xc/xenctrl.ml index de4bae6012..e00a74d48d 100644 --- a/tools/ocaml/libs/xc/xenctrl.ml +++ b/tools/ocaml/libs/xc/xenctrl.ml @@ -65,11 +65,15 @@ type domain_create_flag = | CDF_XS_DOMAIN | CDF_IOMMU +type domain_create_iommu_opts = + | IOMMU_NO_SHAREPT + type domctl_create_config = { ssidref: int32; handle: string; flags: domain_create_flag list; + iommu_opts: domain_create_iommu_opts list; max_vcpus: int; max_evtchn_port: int; max_grant_frames: int; diff --git a/tools/ocaml/libs/xc/xenctrl.mli b/tools/ocaml/libs/xc/xenctrl.mli index c885e75895..0e7049d708 100644 --- a/tools/ocaml/libs/xc/xenctrl.mli +++ b/tools/ocaml/libs/xc/xenctrl.mli @@ -57,10 +57,15 @@ type domain_create_flag = | CDF_OOS_OFF | CDF_XS_DOMAIN | CDF_IOMMU + +type domain_create_iommu_opts = + | IOMMU_NO_SHAREPT + type domctl_create_config = { ssidref: int32; handle: string; flags: domain_create_flag list; + iommu_opts: domain_create_iommu_opts list; max_vcpus: int; max_evtchn_port: int; max_grant_frames: int; diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c index 0cdd873599..48f39f81d5 100644 --- a/tools/ocaml/libs/xc/xenctrl_stubs.c +++ b/tools/ocaml/libs/xc/xenctrl_stubs.c @@ -190,11 +190,12 @@ CAMLprim value stub_xc_domain_create(value xch, value config) #define VAL_SSIDREF Field(config, 0) #define VAL_HANDLE Field(config, 1) #define VAL_FLAGS Field(config, 2) -#define VAL_MAX_VCPUS Field(config, 3) -#define VAL_MAX_EVTCHN_PORT Field(config, 4) -#define VAL_MAX_GRANT_FRAMES Field(config, 5) -#define VAL_MAX_MAPTRACK_FRAMES Field(config, 6) -#define VAL_ARCH Field(config, 7) +#define VAL_IOMMU_OPTS Field(config, 3) +#define VAL_MAX_VCPUS Field(config, 4) +#define VAL_MAX_EVTCHN_PORT Field(config, 5) +#define VAL_MAX_GRANT_FRAMES Field(config, 6) +#define VAL_MAX_MAPTRACK_FRAMES Field(config, 7) +#define VAL_ARCH Field(config, 8) uint32_t domid = 0; int result; @@ -213,6 +214,11 @@ CAMLprim value stub_xc_domain_create(value xch, value config) /* ! XEN_DOMCTL_CDF_ XEN_DOMCTL_CDF_MAX max */ (VAL_FLAGS); + cfg.iommu_opts = ocaml_list_to_c_bitmap + /* ! domain_create_iommu_opts IOMMU_ lc */ + /* ! XEN_DOMCTL_IOMMU_ XEN_DOMCTL_IOMMU_MAX max */ + (VAL_IOMMU_OPTS); + arch_domconfig = Field(VAL_ARCH, 0); switch ( Tag_val(VAL_ARCH) ) { @@ -247,6 +253,7 @@ CAMLprim value stub_xc_domain_create(value xch, value config) #undef VAL_MAX_GRANT_FRAMES #undef VAL_MAX_EVTCHN_PORT #undef VAL_MAX_VCPUS +#undef VAL_IOMMU_OPTS #undef VAL_FLAGS #undef VAL_HANDLE #undef VAL_SSIDREF diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c index 293f5f730e..4b2baa0403 100644 --- a/tools/xl/xl_parse.c +++ b/tools/xl/xl_parse.c @@ -1461,6 +1461,107 @@ void parse_config_data(const char *config_source, exit(1); } + if (!xlu_cfg_get_list (config, "pci", &pcis, 0, 0)) { + d_config->num_pcidevs = 0; + d_config->pcidevs = NULL; + for(i = 0; (buf = xlu_cfg_get_listitem (pcis, i)) != NULL; i++) { + libxl_device_pci *pcidev; + + pcidev = ARRAY_EXTEND_INIT_NODEVID(d_config->pcidevs, + d_config->num_pcidevs, + libxl_device_pci_init); + pcidev->msitranslate = pci_msitranslate; + pcidev->power_mgmt = pci_power_mgmt; + pcidev->permissive = pci_permissive; + pcidev->seize = pci_seize; + /* + * Like other pci option, the per-device policy always follows + * the global policy by default. + */ + pcidev->rdm_policy = b_info->u.hvm.rdm.policy; + e = xlu_pci_parse_bdf(config, pcidev, buf); + if (e) { + fprintf(stderr, + "unable to parse PCI BDF `%s' for passthrough\n", + buf); + exit(-e); + } + } + if (d_config->num_pcidevs && c_info->type == LIBXL_DOMAIN_TYPE_PV) + libxl_defbool_set(&b_info->u.pv.e820_host, true); + } + + if (!xlu_cfg_get_list (config, "dtdev", &dtdevs, 0, 0)) { + d_config->num_dtdevs = 0; + d_config->dtdevs = NULL; + for (i = 0; (buf = xlu_cfg_get_listitem(dtdevs, i)) != NULL; i++) { + libxl_device_dtdev *dtdev; + + dtdev = ARRAY_EXTEND_INIT_NODEVID(d_config->dtdevs, + d_config->num_dtdevs, + libxl_device_dtdev_init); + + dtdev->path = strdup(buf); + if (dtdev->path == NULL) { + fprintf(stderr, "unable to duplicate string for dtdevs\n"); + exit(-1); + } + } + } + + if (xlu_cfg_get_string(config, "passthrough", &buf, 0)) { + buf = (d_config->num_pcidevs || d_config->num_dtdevs) + ? "enabled" : "disabled"; + } + + if (!strncmp(buf, "enabled", strlen(buf))) { + /* Choose a suitable default */ + c_info->passthrough = + (c_info->type == LIBXL_DOMAIN_TYPE_PV) || !iommu_hap_pt_share + ? LIBXL_PASSTHROUGH_SYNC_PT : LIBXL_PASSTHROUGH_SHARE_PT; + } else { + libxl_passthrough o; + + e = libxl_passthrough_from_string(buf, &o); + if (e) { + fprintf(stderr, + "ERROR: unknown passthrough option '%s'\n", + buf); + exit(-ERROR_FAIL); + } + + switch (o) { + case LIBXL_PASSTHROUGH_DISABLED: + if (d_config->num_pcidevs || d_config->num_dtdevs) { + fprintf(stderr, + "ERROR: passthrough disabled but devices are specified\n"); + exit(-ERROR_FAIL); + } + break; + case LIBXL_PASSTHROUGH_SHARE_PT: + if (c_info->type == LIBXL_DOMAIN_TYPE_PV) { + fprintf(stderr, + "ERROR: passthrough=\"share_pt\" not valid for PV domain\n"); + exit(-ERROR_FAIL); + } else if (!iommu_hap_pt_share) { + fprintf(stderr, + "ERROR: passthrough=\"share_pt\" not supported on this platform\n"); + exit(-ERROR_FAIL); + } + break; + case LIBXL_PASSTHROUGH_SYNC_PT: + break; + } + + c_info->passthrough = o; + } + + if ((c_info->passthrough != LIBXL_PASSTHROUGH_DISABLED) && !iommu_enabled) { + fprintf(stderr, + "ERROR: passthrough not supported on this platform\n"); + exit(-ERROR_FAIL); + } + /* libxl_get_required_shadow_memory() and * libxl_get_required_iommu_memory() must be called after final values * (default or specified) for vcpus and memory are set, because the @@ -1470,11 +1571,10 @@ void parse_config_data(const char *config_source, : libxl_get_required_shadow_memory(b_info->max_memkb, b_info->max_vcpus); - /* No IOMMU reservation is needed if either the IOMMU is disabled or it - * can share the P2M. */ - b_info->iommu_memkb = (!iommu_enabled || iommu_hap_pt_share) - ? 0 - : libxl_get_required_iommu_memory(b_info->max_memkb); + /* No IOMMU reservation is needed if passthrough mode is not 'sync_pt' */ + b_info->iommu_memkb = (c_info->passthrough == LIBXL_PASSTHROUGH_SYNC_PT) + ? libxl_get_required_iommu_memory(b_info->max_memkb) + : 0; xlu_cfg_get_defbool(config, "nomigrate", &b_info->disable_migrate, 0); @@ -2298,54 +2398,6 @@ skip_vfb: } } - if (!xlu_cfg_get_list (config, "pci", &pcis, 0, 0)) { - d_config->num_pcidevs = 0; - d_config->pcidevs = NULL; - for(i = 0; (buf = xlu_cfg_get_listitem (pcis, i)) != NULL; i++) { - libxl_device_pci *pcidev; - - pcidev = ARRAY_EXTEND_INIT_NODEVID(d_config->pcidevs, - d_config->num_pcidevs, - libxl_device_pci_init); - pcidev->msitranslate = pci_msitranslate; - pcidev->power_mgmt = pci_power_mgmt; - pcidev->permissive = pci_permissive; - pcidev->seize = pci_seize; - /* - * Like other pci option, the per-device policy always follows - * the global policy by default. - */ - pcidev->rdm_policy = b_info->u.hvm.rdm.policy; - e = xlu_pci_parse_bdf(config, pcidev, buf); - if (e) { - fprintf(stderr, - "unable to parse PCI BDF `%s' for passthrough\n", - buf); - exit(-e); - } - } - if (d_config->num_pcidevs && c_info->type == LIBXL_DOMAIN_TYPE_PV) - libxl_defbool_set(&b_info->u.pv.e820_host, true); - } - - if (!xlu_cfg_get_list (config, "dtdev", &dtdevs, 0, 0)) { - d_config->num_dtdevs = 0; - d_config->dtdevs = NULL; - for (i = 0; (buf = xlu_cfg_get_listitem(dtdevs, i)) != NULL; i++) { - libxl_device_dtdev *dtdev; - - dtdev = ARRAY_EXTEND_INIT_NODEVID(d_config->dtdevs, - d_config->num_dtdevs, - libxl_device_dtdev_init); - - dtdev->path = strdup(buf); - if (dtdev->path == NULL) { - fprintf(stderr, "unable to duplicate string for dtdevs\n"); - exit(-1); - } - } - } - if (!xlu_cfg_get_list(config, "usbctrl", &usbctrls, 0, 0)) { d_config->num_usbctrls = 0; d_config->usbctrls = NULL; diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index ae13e47e86..61d35cd120 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -617,6 +617,14 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config) return -EINVAL; } + /* The P2M table must always be shared between the CPU and the IOMMU */ + if ( config->iommu_opts & XEN_DOMCTL_IOMMU_no_sharept ) + { + dprintk(XENLOG_INFO, + "Unsupported iommu option: XEN_DOMCTL_IOMMU_no_sharept\n"); + return -EINVAL; + } + /* Fill in the native GIC version, passed back to the toolstack. */ if ( config->arch.gic_version == XEN_DOMCTL_CONFIG_GIC_NATIVE ) { @@ -677,7 +685,7 @@ int arch_domain_create(struct domain *d, ASSERT(config != NULL); /* p2m_init relies on some value initialized by the IOMMU subsystem */ - if ( (rc = iommu_domain_init(d)) != 0 ) + if ( (rc = iommu_domain_init(d, config->iommu_opts)) != 0 ) goto fail; if ( (rc = p2m_init(d)) != 0 ) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 4b0ad5e15d..fb8b397be2 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -611,7 +611,7 @@ int arch_domain_create(struct domain *d, if ( (rc = init_domain_irq_mapping(d)) != 0 ) goto fail; - if ( (rc = iommu_domain_init(d)) != 0 ) + if ( (rc = iommu_domain_init(d, config->iommu_opts)) != 0 ) goto fail; psr_domain_init(d); diff --git a/xen/common/domain.c b/xen/common/domain.c index 4681f29c8b..0733ee8b0a 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -308,6 +308,13 @@ static int sanitise_domain_config(struct xen_domctl_createdomain *config) return -EINVAL; } + if ( !(config->flags & XEN_DOMCTL_CDF_iommu) && config->iommu_opts ) + { + dprintk(XENLOG_INFO, + "IOMMU options specified but IOMMU not enabled\n"); + return -EINVAL; + } + if ( config->max_vcpus < 1 ) { dprintk(XENLOG_INFO, "No vCPUS\n"); diff --git a/xen/common/domctl.c b/xen/common/domctl.c index 5dcfe3c8f6..6e6e9b9866 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -515,19 +515,6 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) rover = dom; } - /* - * For now, make sure the createdomain IOMMU flag is set if the - * IOMMU is enabled. When the flag comes under toolstack control - * this can go away. - */ - if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_iommu ) - { - ASSERT_UNREACHABLE(); - return -EINVAL; - } - if ( iommu_enabled ) - op->u.createdomain.flags |= XEN_DOMCTL_CDF_iommu; - d = domain_create(dom, &op->u.createdomain, false); if ( IS_ERR(d) ) { diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c index a49346394d..4171c3cf6f 100644 --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -168,7 +168,7 @@ static void __hwdom_init check_hwdom_reqs(struct domain *d) iommu_hwdom_strict = true; } -int iommu_domain_init(struct domain *d) +int iommu_domain_init(struct domain *d, unsigned int opts) { struct domain_iommu *hd = dom_iommu(d); int ret = 0; @@ -192,6 +192,15 @@ int iommu_domain_init(struct domain *d) if ( is_hardware_domain(d) ) check_hwdom_reqs(d); /* may modify iommu_hwdom_strict */ + /* + * Use shared page tables for HAP and IOMMU if the global option + * is enabled (from which we can infer the h/w is capable) and + * the domain options do not disallow it. HAP must, of course, also + * be enabled. + */ + hd->hap_pt_share = hap_enabled(d) && iommu_hap_pt_share && + !(opts & XEN_DOMCTL_IOMMU_no_sharept); + /* * NB: 'relaxed' h/w domains don't need the IOMMU mappings to be kept * in-sync with their assigned pages because all host RAM will be @@ -200,6 +209,8 @@ int iommu_domain_init(struct domain *d) if ( !is_hardware_domain(d) || iommu_hwdom_strict ) hd->need_sync = !iommu_use_hap_pt(d); + ASSERT(!(hd->need_sync && hd->hap_pt_share)); + return 0; } diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h index 1b3176adb5..ba84aea6ab 100644 --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -38,7 +38,7 @@ #include "hvm/save.h" #include "memory.h" -#define XEN_DOMCTL_INTERFACE_VERSION 0x00000011 +#define XEN_DOMCTL_INTERFACE_VERSION 0x00000012 /* * NB. xen_domctl.domain is an IN/OUT parameter for this operation. @@ -73,6 +73,14 @@ struct xen_domctl_createdomain { uint32_t flags; +#define _XEN_DOMCTL_IOMMU_no_sharept 0 +#define XEN_DOMCTL_IOMMU_no_sharept (1U << _XEN_DOMCTL_IOMMU_no_sharept) + +/* Max XEN_DOMCTL_IOMMU_* constant. Used for ABI checking. */ +#define XEN_DOMCTL_IOMMU_MAX XEN_DOMCTL_IOMMU_no_sharept + + uint32_t iommu_opts; + /* * Various domain limits, which impact the quantity of resources (global * mapping space, xenheap, etc) a guest may consume. diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h index 8b985f92b0..ad556f7d54 100644 --- a/xen/include/xen/iommu.h +++ b/xen/include/xen/iommu.h @@ -75,7 +75,7 @@ extern unsigned int iommu_dev_iotlb_timeout; int iommu_setup(void); int iommu_hardware_setup(void); -int iommu_domain_init(struct domain *d); +int iommu_domain_init(struct domain *d, unsigned int opts); void iommu_hwdom_init(struct domain *d); void iommu_domain_destroy(struct domain *d); @@ -269,10 +269,14 @@ struct domain_iommu { /* Features supported by the IOMMU */ DECLARE_BITMAP(features, IOMMU_FEAT_count); + /* Does the guest share HAP mapping with the IOMMU? */ + bool hap_pt_share; + /* - * Does the guest reqire mappings to be synchonized, to maintain - * the default dfn == pfn map. (See comment on dfn at the top of - * include/xen/mm.h). + * Does the guest require mappings to be synchronized, to maintain + * the default dfn == pfn map? (See comment on dfn at the top of + * include/xen/mm.h). Note that hap_pt_share == false does not + * necessarily imply this is true. */ bool need_sync; }; @@ -282,8 +286,7 @@ struct domain_iommu { #define iommu_clear_feature(d, f) clear_bit(f, dom_iommu(d)->features) /* Are we using the domain P2M table as its IOMMU pagetable? */ -#define iommu_use_hap_pt(d) \ - (hap_enabled(d) && is_iommu_enabled(d) && iommu_hap_pt_share) +#define iommu_use_hap_pt(d) (dom_iommu(d)->hap_pt_share) /* Does the IOMMU pagetable need to be kept synchronized with the P2M */ #ifdef CONFIG_HAS_PASSTHROUGH