diff mbox series

[v5,02/15] iommu: Report domain nesting info

Message ID 1594552870-55687-3-git-send-email-yi.l.liu@intel.com (mailing list archive)
State New, archived
Headers show
Series vfio: expose virtual Shared Virtual Addressing to VMs | expand

Commit Message

Yi Liu July 12, 2020, 11:20 a.m. UTC
IOMMUs that support nesting translation needs report the capability info
to userspace, e.g. the format of first level/stage paging structures.

This patch reports nesting info by DOMAIN_ATTR_NESTING. Caller can get
nesting info after setting DOMAIN_ATTR_NESTING.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v4 -> v5:
*) address comments from Eric Auger.

v3 -> v4:
*) split the SMMU driver changes to be a separate patch
*) move the @addr_width and @pasid_bits from vendor specific
   part to generic part.
*) tweak the description for the @features field of struct
   iommu_nesting_info.
*) add description on the @data[] field of struct iommu_nesting_info

v2 -> v3:
*) remvoe cap/ecap_mask in iommu_nesting_info.
*) reuse DOMAIN_ATTR_NESTING to get nesting info.
*) return an empty iommu_nesting_info for SMMU drivers per Jean'
   suggestion.
---
 include/uapi/linux/iommu.h | 77 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

Comments

Eric Auger July 17, 2020, 4:29 p.m. UTC | #1
Hi Yi,

On 7/12/20 1:20 PM, Liu Yi L wrote:
> IOMMUs that support nesting translation needs report the capability info
s/needs/need to report
> to userspace, e.g. the format of first level/stage paging structures.
It gives information about requirements the userspace needs to implement
plus other features characterizing the physical implementation.
> 
> This patch reports nesting info by DOMAIN_ATTR_NESTING. Caller can get
> nesting info after setting DOMAIN_ATTR_NESTING.
I guess you meant after selecting VFIO_TYPE1_NESTING_IOMMU?
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
> v4 -> v5:
> *) address comments from Eric Auger.
> 
> v3 -> v4:
> *) split the SMMU driver changes to be a separate patch
> *) move the @addr_width and @pasid_bits from vendor specific
>    part to generic part.
> *) tweak the description for the @features field of struct
>    iommu_nesting_info.
> *) add description on the @data[] field of struct iommu_nesting_info
> 
> v2 -> v3:
> *) remvoe cap/ecap_mask in iommu_nesting_info.
> *) reuse DOMAIN_ATTR_NESTING to get nesting info.
> *) return an empty iommu_nesting_info for SMMU drivers per Jean'
>    suggestion.
> ---
>  include/uapi/linux/iommu.h | 77 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 77 insertions(+)
> 
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index 1afc661..d2a47c4 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -332,4 +332,81 @@ struct iommu_gpasid_bind_data {
>  	} vendor;
>  };
>  
> +/*
> + * struct iommu_nesting_info - Information for nesting-capable IOMMU.
> + *			       user space should check it before using
> + *			       nesting capability.
> + *
> + * @size:	size of the whole structure
> + * @format:	PASID table entry format, the same definition as struct
> + *		iommu_gpasid_bind_data @format.
> + * @features:	supported nesting features.
> + * @flags:	currently reserved for future extension.
> + * @addr_width:	The output addr width of first level/stage translation
> + * @pasid_bits:	Maximum supported PASID bits, 0 represents no PASID
> + *		support.
> + * @data:	vendor specific cap info. data[] structure type can be deduced
> + *		from @format field.
> + *
> + * +===============+======================================================+
> + * | feature       |  Notes                                               |
> + * +===============+======================================================+
> + * | SYSWIDE_PASID |  PASIDs are managed in system-wide, instead of per   |
s/in system-wide/system-wide ?
> + * |               |  device. When a device is assigned to userspace or   |
> + * |               |  VM, proper uAPI (userspace driver framework uAPI,   |
> + * |               |  e.g. VFIO) must be used to allocate/free PASIDs for |
> + * |               |  the assigned device.          
Isn't it possible to be more explicit, something like:
                      |
System-wide PASID management is mandated by the physical IOMMU. All
PASIDs allocation must be mediated through the TBD API.
> + * +---------------+------------------------------------------------------+
> + * | BIND_PGTBL    |  The owner of the first level/stage page table must  |
> + * |               |  explicitly bind the page table to associated PASID  |
> + * |               |  (either the one specified in bind request or the    |
> + * |               |  default PASID of iommu domain), through userspace   |
> + * |               |  driver framework uAPI (e.g. VFIO_IOMMU_NESTING_OP). |
As per your answer in https://lkml.org/lkml/2020/7/6/383, I now
understand ARM would not expose that BIND_PGTBL nesting feature, I still
think the above wording is a bit confusing. Maybe you may explicitly
talk about the PASID *entry* that needs to be passed from guest to host.
On ARM we directly pass the PASID table but when reading the above
description I fail to determine if this does not fit that description.
> + * +---------------+------------------------------------------------------+
> + * | CACHE_INVLD   |  The owner of the first level/stage page table must  |
> + * |               |  explicitly invalidate the IOMMU cache through uAPI  |
> + * |               |  provided by userspace driver framework (e.g. VFIO)  |
> + * |               |  according to vendor-specific requirement when       |
> + * |               |  changing the page table.                            |
> + * +---------------+------------------------------------------------------+

instead of using the "uAPI provided by userspace driver framework (e.g.
VFIO)", can't we use the so-called IOMMU UAPI terminology which now has
a userspace documentation?

> + *
> + * @data[] types defined for @format:
> + * +================================+=====================================+
> + * | @format                        | @data[]                             |
> + * +================================+=====================================+
> + * | IOMMU_PASID_FORMAT_INTEL_VTD   | struct iommu_nesting_info_vtd       |
> + * +--------------------------------+-------------------------------------+
> + *
> + */
> +struct iommu_nesting_info {
> +	__u32	size;
shouldn't it be @argsz to fit the iommu uapi convention and take benefit
to put the flags field just below?
> +	__u32	format;
> +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID	(1 << 0)
> +#define IOMMU_NESTING_FEAT_BIND_PGTBL		(1 << 1)
> +#define IOMMU_NESTING_FEAT_CACHE_INVLD		(1 << 2)
> +	__u32	features;
> +	__u32	flags;
> +	__u16	addr_width;
> +	__u16	pasid_bits;
> +	__u32	padding;
> +	__u8	data[];
> +};
> +
> +/*
> + * struct iommu_nesting_info_vtd - Intel VT-d specific nesting info
> + *
> + * @flags:	VT-d specific flags. Currently reserved for future
> + *		extension.
must be set to 0?
> + * @cap_reg:	Describe basic capabilities as defined in VT-d capability
> + *		register.
> + * @ecap_reg:	Describe the extended capabilities as defined in VT-d
> + *		extended capability register.
> + */
> +struct iommu_nesting_info_vtd {
> +	__u32	flags;
> +	__u32	padding;
> +	__u64	cap_reg;
> +	__u64	ecap_reg;
> +};
> +
>  #endif /* _UAPI_IOMMU_H */
Thanks

Eric
>
Yi Liu July 20, 2020, 7:20 a.m. UTC | #2
Hi Eric,

> From: Auger Eric <eric.auger@redhat.com>
> Sent: Saturday, July 18, 2020 12:29 AM
> 
> Hi Yi,
> 
> On 7/12/20 1:20 PM, Liu Yi L wrote:
> > IOMMUs that support nesting translation needs report the capability info
> s/needs/need to report

yep.

> > to userspace, e.g. the format of first level/stage paging structures.
> It gives information about requirements the userspace needs to implement
> plus other features characterizing the physical implementation.

got it. will add it in next version.

> >
> > This patch reports nesting info by DOMAIN_ATTR_NESTING. Caller can get
> > nesting info after setting DOMAIN_ATTR_NESTING.
> I guess you meant after selecting VFIO_TYPE1_NESTING_IOMMU?

yes, it is. ok, perhaps, it's better to say get nesting info after selecting
VFIO_TYPE1_NESTING_IOMMU.

> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> > v4 -> v5:
> > *) address comments from Eric Auger.
> >
> > v3 -> v4:
> > *) split the SMMU driver changes to be a separate patch
> > *) move the @addr_width and @pasid_bits from vendor specific
> >    part to generic part.
> > *) tweak the description for the @features field of struct
> >    iommu_nesting_info.
> > *) add description on the @data[] field of struct iommu_nesting_info
> >
> > v2 -> v3:
> > *) remvoe cap/ecap_mask in iommu_nesting_info.
> > *) reuse DOMAIN_ATTR_NESTING to get nesting info.
> > *) return an empty iommu_nesting_info for SMMU drivers per Jean'
> >    suggestion.
> > ---
> >  include/uapi/linux/iommu.h | 77
> ++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 77 insertions(+)
> >
> > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> > index 1afc661..d2a47c4 100644
> > --- a/include/uapi/linux/iommu.h
> > +++ b/include/uapi/linux/iommu.h
> > @@ -332,4 +332,81 @@ struct iommu_gpasid_bind_data {
> >  	} vendor;
> >  };
> >
> > +/*
> > + * struct iommu_nesting_info - Information for nesting-capable IOMMU.
> > + *			       user space should check it before using
> > + *			       nesting capability.
> > + *
> > + * @size:	size of the whole structure
> > + * @format:	PASID table entry format, the same definition as struct
> > + *		iommu_gpasid_bind_data @format.
> > + * @features:	supported nesting features.
> > + * @flags:	currently reserved for future extension.
> > + * @addr_width:	The output addr width of first level/stage translation
> > + * @pasid_bits:	Maximum supported PASID bits, 0 represents no PASID
> > + *		support.
> > + * @data:	vendor specific cap info. data[] structure type can be deduced
> > + *		from @format field.
> > + *
> > + *
> +===============+===================================================
> ===+
> > + * | feature       |  Notes                                               |
> > + *
> +===============+===================================================
> ===+
> > + * | SYSWIDE_PASID |  PASIDs are managed in system-wide, instead of per   |
> s/in system-wide/system-wide ?

got it.

> > + * |               |  device. When a device is assigned to userspace or   |
> > + * |               |  VM, proper uAPI (userspace driver framework uAPI,   |
> > + * |               |  e.g. VFIO) must be used to allocate/free PASIDs for |
> > + * |               |  the assigned device.
> Isn't it possible to be more explicit, something like:
>                       |
> System-wide PASID management is mandated by the physical IOMMU. All
> PASIDs allocation must be mediated through the TBD API.

yep, I can add it.

> > + * +---------------+------------------------------------------------------+
> > + * | BIND_PGTBL    |  The owner of the first level/stage page table must  |
> > + * |               |  explicitly bind the page table to associated PASID  |
> > + * |               |  (either the one specified in bind request or the    |
> > + * |               |  default PASID of iommu domain), through userspace   |
> > + * |               |  driver framework uAPI (e.g. VFIO_IOMMU_NESTING_OP). |
> As per your answer in https://lkml.org/lkml/2020/7/6/383, I now
> understand ARM would not expose that BIND_PGTBL nesting feature,

yes, that's my point.

> I still
> think the above wording is a bit confusing. Maybe you may explicitly
> talk about the PASID *entry* that needs to be passed from guest to host.
> On ARM we directly pass the PASID table but when reading the above
> description I fail to determine if this does not fit that description.

yes, I can do it.

> > + * +---------------+------------------------------------------------------+
> > + * | CACHE_INVLD   |  The owner of the first level/stage page table must  |
> > + * |               |  explicitly invalidate the IOMMU cache through uAPI  |
> > + * |               |  provided by userspace driver framework (e.g. VFIO)  |
> > + * |               |  according to vendor-specific requirement when       |
> > + * |               |  changing the page table.                            |
> > + * +---------------+------------------------------------------------------+
> 
> instead of using the "uAPI provided by userspace driver framework (e.g.
> VFIO)", can't we use the so-called IOMMU UAPI terminology which now has
> a userspace documentation?

the problem is current IOMMU UAPI definitions is actually embedded in
other VFIO UAPI. if it can make the description more clear, I can follow
your suggestion. :-)

> 
> > + *
> > + * @data[] types defined for @format:
> > + *
> +================================+==================================
> ===+
> > + * | @format                        | @data[]                             |
> > + *
> +================================+==================================
> ===+
> > + * | IOMMU_PASID_FORMAT_INTEL_VTD   | struct iommu_nesting_info_vtd       |
> > + * +--------------------------------+-------------------------------------+
> > + *
> > + */
> > +struct iommu_nesting_info {
> > +	__u32	size;
> shouldn't it be @argsz to fit the iommu uapi convention and take benefit
> to put the flags field just below?

make sense.

> > +	__u32	format;
> > +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID	(1 << 0)
> > +#define IOMMU_NESTING_FEAT_BIND_PGTBL		(1 << 1)
> > +#define IOMMU_NESTING_FEAT_CACHE_INVLD		(1 << 2)
> > +	__u32	features;
> > +	__u32	flags;
> > +	__u16	addr_width;
> > +	__u16	pasid_bits;
> > +	__u32	padding;
> > +	__u8	data[];
> > +};
> > +
> > +/*
> > + * struct iommu_nesting_info_vtd - Intel VT-d specific nesting info
> > + *
> > + * @flags:	VT-d specific flags. Currently reserved for future
> > + *		extension.
> must be set to 0?

yes. will add it.

Thanks,
Yi Liu

> > + * @cap_reg:	Describe basic capabilities as defined in VT-d capability
> > + *		register.
> > + * @ecap_reg:	Describe the extended capabilities as defined in VT-d
> > + *		extended capability register.
> > + */
> > +struct iommu_nesting_info_vtd {
> > +	__u32	flags;
> > +	__u32	padding;
> > +	__u64	cap_reg;
> > +	__u64	ecap_reg;
> > +};
> > +
> >  #endif /* _UAPI_IOMMU_H */
> Thanks
> 
> Eric
> >
diff mbox series

Patch

diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 1afc661..d2a47c4 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -332,4 +332,81 @@  struct iommu_gpasid_bind_data {
 	} vendor;
 };
 
+/*
+ * struct iommu_nesting_info - Information for nesting-capable IOMMU.
+ *			       user space should check it before using
+ *			       nesting capability.
+ *
+ * @size:	size of the whole structure
+ * @format:	PASID table entry format, the same definition as struct
+ *		iommu_gpasid_bind_data @format.
+ * @features:	supported nesting features.
+ * @flags:	currently reserved for future extension.
+ * @addr_width:	The output addr width of first level/stage translation
+ * @pasid_bits:	Maximum supported PASID bits, 0 represents no PASID
+ *		support.
+ * @data:	vendor specific cap info. data[] structure type can be deduced
+ *		from @format field.
+ *
+ * +===============+======================================================+
+ * | feature       |  Notes                                               |
+ * +===============+======================================================+
+ * | SYSWIDE_PASID |  PASIDs are managed in system-wide, instead of per   |
+ * |               |  device. When a device is assigned to userspace or   |
+ * |               |  VM, proper uAPI (userspace driver framework uAPI,   |
+ * |               |  e.g. VFIO) must be used to allocate/free PASIDs for |
+ * |               |  the assigned device.                                |
+ * +---------------+------------------------------------------------------+
+ * | BIND_PGTBL    |  The owner of the first level/stage page table must  |
+ * |               |  explicitly bind the page table to associated PASID  |
+ * |               |  (either the one specified in bind request or the    |
+ * |               |  default PASID of iommu domain), through userspace   |
+ * |               |  driver framework uAPI (e.g. VFIO_IOMMU_NESTING_OP). |
+ * +---------------+------------------------------------------------------+
+ * | CACHE_INVLD   |  The owner of the first level/stage page table must  |
+ * |               |  explicitly invalidate the IOMMU cache through uAPI  |
+ * |               |  provided by userspace driver framework (e.g. VFIO)  |
+ * |               |  according to vendor-specific requirement when       |
+ * |               |  changing the page table.                            |
+ * +---------------+------------------------------------------------------+
+ *
+ * @data[] types defined for @format:
+ * +================================+=====================================+
+ * | @format                        | @data[]                             |
+ * +================================+=====================================+
+ * | IOMMU_PASID_FORMAT_INTEL_VTD   | struct iommu_nesting_info_vtd       |
+ * +--------------------------------+-------------------------------------+
+ *
+ */
+struct iommu_nesting_info {
+	__u32	size;
+	__u32	format;
+#define IOMMU_NESTING_FEAT_SYSWIDE_PASID	(1 << 0)
+#define IOMMU_NESTING_FEAT_BIND_PGTBL		(1 << 1)
+#define IOMMU_NESTING_FEAT_CACHE_INVLD		(1 << 2)
+	__u32	features;
+	__u32	flags;
+	__u16	addr_width;
+	__u16	pasid_bits;
+	__u32	padding;
+	__u8	data[];
+};
+
+/*
+ * struct iommu_nesting_info_vtd - Intel VT-d specific nesting info
+ *
+ * @flags:	VT-d specific flags. Currently reserved for future
+ *		extension.
+ * @cap_reg:	Describe basic capabilities as defined in VT-d capability
+ *		register.
+ * @ecap_reg:	Describe the extended capabilities as defined in VT-d
+ *		extended capability register.
+ */
+struct iommu_nesting_info_vtd {
+	__u32	flags;
+	__u32	padding;
+	__u64	cap_reg;
+	__u64	ecap_reg;
+};
+
 #endif /* _UAPI_IOMMU_H */