diff mbox

[v9,7/7] vfio/type1: return MSI geometry through VFIO_IOMMU_GET_INFO capability chains

Message ID 1462362858-2925-8-git-send-email-eric.auger@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Eric Auger May 4, 2016, 11:54 a.m. UTC
This patch allows the user-space to retrieve the MSI geometry. The
implementation is based on capability chains, now also added to
VFIO_IOMMU_GET_INFO.

The returned info comprise:
- whether the MSI IOVA are constrained to a reserved range (x86 case) and
  in the positive, the start/end of the aperture,
- or whether the IOVA aperture need to be set by the userspace. In that
  case, the size and alignment of the IOVA region to be provided are
  returned.

In case the userspace must provide the IOVA range, we currently return
an arbitrary number of IOVA pages (16), supposed to fulfill the needs of
current ARM platforms. This may be deprecated by a more sophisticated
computation later on.

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---
v8 -> v9:
- use iommu_msi_supported flag instead of programmable
- replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
  capability chain, reporting the MSI geometry

v7 -> v8:
- use iommu_domain_msi_geometry

v6 -> v7:
- remove the computation of the number of IOVA pages to be provisionned.
  This number depends on the domain/group/device topology which can
  dynamically change. Let's rely instead rely on an arbitrary max depending
  on the system

v4 -> v5:
- move msi_info and ret declaration within the conditional code

v3 -> v4:
- replace former vfio_domains_require_msi_mapping by
  more complex computation of MSI mapping requirements, especially the
  number of pages to be provided by the user-space.
- reword patch title

RFC v1 -> v1:
- derived from
  [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
- renamed allow_msi_reconfig into require_msi_mapping
- fixed VFIO_IOMMU_GET_INFO
---
 drivers/vfio/vfio_iommu_type1.c | 69 +++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h       | 30 +++++++++++++++++-
 2 files changed, 98 insertions(+), 1 deletion(-)

Comments

Eric Auger May 4, 2016, 12:06 p.m. UTC | #1
Hi Alex,
On 05/04/2016 01:54 PM, Eric Auger wrote:
> This patch allows the user-space to retrieve the MSI geometry. The
> implementation is based on capability chains, now also added to
> VFIO_IOMMU_GET_INFO.

If you prefer we could consider this patch outside of the main series
since it brings extra functionalities (MSI geometry reporting). In a
first QEMU integration we would live without knowing the MSI geometry I
think, all the more so I currently report an arbitrary number of
requested IOVA pages. The computation of the exact number of doorbells
to map brings extra complexity and I did not address this issue yet.

It sketches a possible user API to report the MSI geometry based on the
capability chains, as you suggested some time ago. I am currently busy
drafting a QEMU integration.

Best Regards

Eric
> 
> The returned info comprise:
> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
>   in the positive, the start/end of the aperture,
> - or whether the IOVA aperture need to be set by the userspace. In that
>   case, the size and alignment of the IOVA region to be provided are
>   returned.
> 
> In case the userspace must provide the IOVA range, we currently return
> an arbitrary number of IOVA pages (16), supposed to fulfill the needs of
> current ARM platforms. This may be deprecated by a more sophisticated
> computation later on.
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> v8 -> v9:
> - use iommu_msi_supported flag instead of programmable
> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
>   capability chain, reporting the MSI geometry
> 
> v7 -> v8:
> - use iommu_domain_msi_geometry
> 
> v6 -> v7:
> - remove the computation of the number of IOVA pages to be provisionned.
>   This number depends on the domain/group/device topology which can
>   dynamically change. Let's rely instead rely on an arbitrary max depending
>   on the system
> 
> v4 -> v5:
> - move msi_info and ret declaration within the conditional code
> 
> v3 -> v4:
> - replace former vfio_domains_require_msi_mapping by
>   more complex computation of MSI mapping requirements, especially the
>   number of pages to be provided by the user-space.
> - reword patch title
> 
> RFC v1 -> v1:
> - derived from
>   [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
> - renamed allow_msi_reconfig into require_msi_mapping
> - fixed VFIO_IOMMU_GET_INFO
> ---
>  drivers/vfio/vfio_iommu_type1.c | 69 +++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h       | 30 +++++++++++++++++-
>  2 files changed, 98 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 2fc8197..841360b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1134,6 +1134,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>  	return ret;
>  }
>  
> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
> +				     struct vfio_info_cap *caps)
> +{
> +	struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
> +	struct iommu_domain_msi_geometry msi_geometry;
> +	struct vfio_info_cap_header *header;
> +	struct vfio_domain *d;
> +	bool mapping_required;
> +	size_t size;
> +
> +	mutex_lock(&iommu->lock);
> +	/* All domains have same require_msi_map property, pick first */
> +	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
> +	iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
> +			      &msi_geometry);
> +	mapping_required = msi_geometry.iommu_msi_supported;
> +
> +	mutex_unlock(&iommu->lock);
> +
> +	size = sizeof(*vfio_msi_geometry);
> +	header = vfio_info_cap_add(caps, size,
> +				   VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
> +
> +	if (IS_ERR(header))
> +		return PTR_ERR(header);
> +
> +	vfio_msi_geometry = container_of(header,
> +				struct vfio_iommu_type1_info_cap_msi_geometry,
> +				header);
> +
> +	vfio_msi_geometry->reserved = !mapping_required;
> +	if (vfio_msi_geometry->reserved) {
> +		vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
> +		vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;
> +		return 0;
> +	}
> +
> +	vfio_msi_geometry->alignment = 1 << __ffs(vfio_pgsize_bitmap(iommu));
> +	/* we currently report the need for an arbitray number of 16 pages */
> +	vfio_msi_geometry->size = 16 * vfio_msi_geometry->alignment;
> +
> +	return 0;
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  				   unsigned int cmd, unsigned long arg)
>  {
> @@ -1155,6 +1199,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  		}
>  	} else if (cmd == VFIO_IOMMU_GET_INFO) {
>  		struct vfio_iommu_type1_info info;
> +		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> +		int ret;
>  
>  		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
>  
> @@ -1168,6 +1214,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>  		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>  
> +		ret = compute_msi_geometry_caps(iommu, &caps);
> +		if (ret)
> +			return ret;
> +
> +		if (caps.size) {
> +			info.flags |= VFIO_IOMMU_INFO_CAPS;
> +			if (info.argsz < sizeof(info) + caps.size) {
> +				info.argsz = sizeof(info) + caps.size;
> +				info.cap_offset = 0;
> +			} else {
> +				vfio_info_cap_shift(&caps, sizeof(info));
> +				if (copy_to_user((void __user *)arg +
> +						sizeof(info), caps.buf,
> +						caps.size)) {
> +					kfree(caps.buf);
> +					return -EFAULT;
> +				}
> +				info.cap_offset = sizeof(info);
> +			}
> +
> +			kfree(caps.buf);
> +		}
> +
>  		return copy_to_user((void __user *)arg, &info, minsz) ?
>  			-EFAULT : 0;
>  
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 4a9dbc2..0ff6a8d 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -488,7 +488,33 @@ struct vfio_iommu_type1_info {
>  	__u32	argsz;
>  	__u32	flags;
>  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
> -	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
> +#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
> +	__u32   cap_offset;	/* Offset within info struct of first cap */
> +	__u64	iova_pgsizes;	/* Bitmap of supported page sizes */
> +};
> +
> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY	1
> +
> +/*
> + * The MSI geometry capability allows to report the MSI IOVA geometry:
> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
> + *   whose boundaries are given by [@aperture_start, @aperture_end].
> + *   this is typically the case on x86 host. The userspace is not allowed
> + *   to map userspace memory at IOVAs intersecting this range using
> + *   VFIO_IOMMU_MAP_DMA.
> + * - or the MSI IOVAs are not requested to belong to any reserved range;
> + *   in that case the userspace must provide an IOVA window characterized by
> + *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
> + */
> +struct vfio_iommu_type1_info_cap_msi_geometry {
> +	struct vfio_info_cap_header header;
> +	bool reserved; /* Are MSI IOVAs within a reserved aperture? */
> +	/* reserved */
> +	__u64 aperture_start;
> +	__u64 aperture_end;
> +	/* not reserved */
> +	__u64 size; /* IOVA aperture size in bytes the userspace must provide */
> +	__u64 alignment; /* alignment of the window, in bytes */
>  };
>  
>  #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> @@ -503,6 +529,8 @@ struct vfio_iommu_type1_info {
>   * IOVA region that will be used on some platforms to map the host MSI frames.
>   * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>   * IOVA region stays until the container is closed.
> + * The requirement for provisioning such reserved IOVA range can be checked by
> + * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
>   */
>  struct vfio_iommu_type1_dma_map {
>  	__u32	argsz;
>
Alex Williamson May 9, 2016, 10:49 p.m. UTC | #2
On Wed,  4 May 2016 11:54:18 +0000
Eric Auger <eric.auger@linaro.org> wrote:

> This patch allows the user-space to retrieve the MSI geometry. The
> implementation is based on capability chains, now also added to
> VFIO_IOMMU_GET_INFO.
> 
> The returned info comprise:
> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
>   in the positive, the start/end of the aperture,
> - or whether the IOVA aperture need to be set by the userspace. In that
>   case, the size and alignment of the IOVA region to be provided are
>   returned.
> 
> In case the userspace must provide the IOVA range, we currently return
> an arbitrary number of IOVA pages (16), supposed to fulfill the needs of
> current ARM platforms. This may be deprecated by a more sophisticated
> computation later on.
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> v8 -> v9:
> - use iommu_msi_supported flag instead of programmable
> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
>   capability chain, reporting the MSI geometry
> 
> v7 -> v8:
> - use iommu_domain_msi_geometry
> 
> v6 -> v7:
> - remove the computation of the number of IOVA pages to be provisionned.
>   This number depends on the domain/group/device topology which can
>   dynamically change. Let's rely instead rely on an arbitrary max depending
>   on the system
> 
> v4 -> v5:
> - move msi_info and ret declaration within the conditional code
> 
> v3 -> v4:
> - replace former vfio_domains_require_msi_mapping by
>   more complex computation of MSI mapping requirements, especially the
>   number of pages to be provided by the user-space.
> - reword patch title
> 
> RFC v1 -> v1:
> - derived from
>   [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
> - renamed allow_msi_reconfig into require_msi_mapping
> - fixed VFIO_IOMMU_GET_INFO
> ---
>  drivers/vfio/vfio_iommu_type1.c | 69 +++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h       | 30 +++++++++++++++++-
>  2 files changed, 98 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 2fc8197..841360b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1134,6 +1134,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>  	return ret;
>  }
>  
> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
> +				     struct vfio_info_cap *caps)
> +{
> +	struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
> +	struct iommu_domain_msi_geometry msi_geometry;
> +	struct vfio_info_cap_header *header;
> +	struct vfio_domain *d;
> +	bool mapping_required;
> +	size_t size;
> +
> +	mutex_lock(&iommu->lock);
> +	/* All domains have same require_msi_map property, pick first */
> +	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
> +	iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
> +			      &msi_geometry);
> +	mapping_required = msi_geometry.iommu_msi_supported;
> +
> +	mutex_unlock(&iommu->lock);
> +
> +	size = sizeof(*vfio_msi_geometry);
> +	header = vfio_info_cap_add(caps, size,
> +				   VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
> +
> +	if (IS_ERR(header))
> +		return PTR_ERR(header);
> +
> +	vfio_msi_geometry = container_of(header,
> +				struct vfio_iommu_type1_info_cap_msi_geometry,
> +				header);
> +
> +	vfio_msi_geometry->reserved = !mapping_required;
> +	if (vfio_msi_geometry->reserved) {
> +		vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
> +		vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;
> +		return 0;
> +	}
> +
> +	vfio_msi_geometry->alignment = 1 << __ffs(vfio_pgsize_bitmap(iommu));
> +	/* we currently report the need for an arbitray number of 16 pages */
> +	vfio_msi_geometry->size = 16 * vfio_msi_geometry->alignment;

Hmm, that really is arbitrary.  How could we know a real value here?

> +
> +	return 0;
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  				   unsigned int cmd, unsigned long arg)
>  {
> @@ -1155,6 +1199,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  		}
>  	} else if (cmd == VFIO_IOMMU_GET_INFO) {
>  		struct vfio_iommu_type1_info info;
> +		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> +		int ret;
>  
>  		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
>  
> @@ -1168,6 +1214,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>  		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>  
> +		ret = compute_msi_geometry_caps(iommu, &caps);
> +		if (ret)
> +			return ret;
> +
> +		if (caps.size) {
> +			info.flags |= VFIO_IOMMU_INFO_CAPS;
> +			if (info.argsz < sizeof(info) + caps.size) {
> +				info.argsz = sizeof(info) + caps.size;
> +				info.cap_offset = 0;
> +			} else {
> +				vfio_info_cap_shift(&caps, sizeof(info));
> +				if (copy_to_user((void __user *)arg +
> +						sizeof(info), caps.buf,
> +						caps.size)) {
> +					kfree(caps.buf);
> +					return -EFAULT;
> +				}
> +				info.cap_offset = sizeof(info);
> +			}
> +
> +			kfree(caps.buf);
> +		}
> +
>  		return copy_to_user((void __user *)arg, &info, minsz) ?
>  			-EFAULT : 0;
>  
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 4a9dbc2..0ff6a8d 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -488,7 +488,33 @@ struct vfio_iommu_type1_info {
>  	__u32	argsz;
>  	__u32	flags;
>  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
> -	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
> +#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
> +	__u32   cap_offset;	/* Offset within info struct of first cap */
> +	__u64	iova_pgsizes;	/* Bitmap of supported page sizes */

This would break existing users, we can't arbitrarily change the offset
of iova_pgsizes.  We can add cap_offset to the end and I think
everything would work about above if we do that.

> +};
> +
> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY	1
> +
> +/*
> + * The MSI geometry capability allows to report the MSI IOVA geometry:
> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
> + *   whose boundaries are given by [@aperture_start, @aperture_end].
> + *   this is typically the case on x86 host. The userspace is not allowed
> + *   to map userspace memory at IOVAs intersecting this range using
> + *   VFIO_IOMMU_MAP_DMA.
> + * - or the MSI IOVAs are not requested to belong to any reserved range;
> + *   in that case the userspace must provide an IOVA window characterized by
> + *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
> + */
> +struct vfio_iommu_type1_info_cap_msi_geometry {
> +	struct vfio_info_cap_header header;
> +	bool reserved; /* Are MSI IOVAs within a reserved aperture? */

Do bools have a guaranteed user size?  Let's make this a __u32 and call
it flags with bit 0 defined as reserved.  I'm tempted to suggest we
could figure out how to make alignment fit in another __u32 so we have a
properly packed structure, otherwise we should make a reserved __u32.

> +	/* reserved */
> +	__u64 aperture_start;
> +	__u64 aperture_end;
> +	/* not reserved */
> +	__u64 size; /* IOVA aperture size in bytes the userspace must provide */
> +	__u64 alignment; /* alignment of the window, in bytes */
>  };
>  
>  #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> @@ -503,6 +529,8 @@ struct vfio_iommu_type1_info {
>   * IOVA region that will be used on some platforms to map the host MSI frames.
>   * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>   * IOVA region stays until the container is closed.
> + * The requirement for provisioning such reserved IOVA range can be checked by
> + * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
>   */
>  struct vfio_iommu_type1_dma_map {
>  	__u32	argsz;
Alex Williamson May 9, 2016, 11:03 p.m. UTC | #3
On Wed, 4 May 2016 14:06:19 +0200
Eric Auger <eric.auger@linaro.org> wrote:

> Hi Alex,
> On 05/04/2016 01:54 PM, Eric Auger wrote:
> > This patch allows the user-space to retrieve the MSI geometry. The
> > implementation is based on capability chains, now also added to
> > VFIO_IOMMU_GET_INFO.  
> 
> If you prefer we could consider this patch outside of the main series
> since it brings extra functionalities (MSI geometry reporting). In a
> first QEMU integration we would live without knowing the MSI geometry I
> think, all the more so I currently report an arbitrary number of
> requested IOVA pages. The computation of the exact number of doorbells
> to map brings extra complexity and I did not address this issue yet.
> 
> It sketches a possible user API to report the MSI geometry based on the
> capability chains, as you suggested some time ago. I am currently busy
> drafting a QEMU integration.

How would the user know that reserved MSI mappings are requires or
available without this?  Wouldn't the only option be for userspace to
try to map something with the reserved MSI flag set and see if the
kernel accepts it?  That's not a very desirable programming model.  The
arbitrary size is pretty ugly, but it at least makes for a consistent
user interface.  Is it a functional issue if we overestimate the size
or is it just a matter of wasting IOVA space?  Is there significant
harm in making it obscenely large, like 1MB?  The reference counting and
re-use of IOVA pages seems like we may often only be using a single
IOVA page for multiple doorbells.  I guess I'm leaning towards defining
the API even if the value is somewhat arbitrary because we'd rather have
control of this rather than having the user guess and try to rope them
back in later to use a kernel recommended value.  Thanks,

Alex
Eric Auger May 10, 2016, 4:36 p.m. UTC | #4
Hi Alex,
On 05/10/2016 12:49 AM, Alex Williamson wrote:
> On Wed,  4 May 2016 11:54:18 +0000
> Eric Auger <eric.auger@linaro.org> wrote:
> 
>> This patch allows the user-space to retrieve the MSI geometry. The
>> implementation is based on capability chains, now also added to
>> VFIO_IOMMU_GET_INFO.
>>
>> The returned info comprise:
>> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
>>   in the positive, the start/end of the aperture,
>> - or whether the IOVA aperture need to be set by the userspace. In that
>>   case, the size and alignment of the IOVA region to be provided are
>>   returned.
>>
>> In case the userspace must provide the IOVA range, we currently return
>> an arbitrary number of IOVA pages (16), supposed to fulfill the needs of
>> current ARM platforms. This may be deprecated by a more sophisticated
>> computation later on.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>> v8 -> v9:
>> - use iommu_msi_supported flag instead of programmable
>> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
>>   capability chain, reporting the MSI geometry
>>
>> v7 -> v8:
>> - use iommu_domain_msi_geometry
>>
>> v6 -> v7:
>> - remove the computation of the number of IOVA pages to be provisionned.
>>   This number depends on the domain/group/device topology which can
>>   dynamically change. Let's rely instead rely on an arbitrary max depending
>>   on the system
>>
>> v4 -> v5:
>> - move msi_info and ret declaration within the conditional code
>>
>> v3 -> v4:
>> - replace former vfio_domains_require_msi_mapping by
>>   more complex computation of MSI mapping requirements, especially the
>>   number of pages to be provided by the user-space.
>> - reword patch title
>>
>> RFC v1 -> v1:
>> - derived from
>>   [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
>> - renamed allow_msi_reconfig into require_msi_mapping
>> - fixed VFIO_IOMMU_GET_INFO
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 69 +++++++++++++++++++++++++++++++++++++++++
>>  include/uapi/linux/vfio.h       | 30 +++++++++++++++++-
>>  2 files changed, 98 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 2fc8197..841360b 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -1134,6 +1134,50 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>>  	return ret;
>>  }
>>  
>> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
>> +				     struct vfio_info_cap *caps)
>> +{
>> +	struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
>> +	struct iommu_domain_msi_geometry msi_geometry;
>> +	struct vfio_info_cap_header *header;
>> +	struct vfio_domain *d;
>> +	bool mapping_required;
>> +	size_t size;
>> +
>> +	mutex_lock(&iommu->lock);
>> +	/* All domains have same require_msi_map property, pick first */
>> +	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
>> +	iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
>> +			      &msi_geometry);
>> +	mapping_required = msi_geometry.iommu_msi_supported;
>> +
>> +	mutex_unlock(&iommu->lock);
>> +
>> +	size = sizeof(*vfio_msi_geometry);
>> +	header = vfio_info_cap_add(caps, size,
>> +				   VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
>> +
>> +	if (IS_ERR(header))
>> +		return PTR_ERR(header);
>> +
>> +	vfio_msi_geometry = container_of(header,
>> +				struct vfio_iommu_type1_info_cap_msi_geometry,
>> +				header);
>> +
>> +	vfio_msi_geometry->reserved = !mapping_required;
>> +	if (vfio_msi_geometry->reserved) {
>> +		vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
>> +		vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;
>> +		return 0;
>> +	}
>> +
>> +	vfio_msi_geometry->alignment = 1 << __ffs(vfio_pgsize_bitmap(iommu));
>> +	/* we currently report the need for an arbitray number of 16 pages */
>> +	vfio_msi_geometry->size = 16 * vfio_msi_geometry->alignment;
> 
> Hmm, that really is arbitrary.  How could we know a real value here?
Yes I fully agree and this is aknowledged in the cover/commit msg. I
dared to do that because this has the benefits to allow introducing the
userspace API while refining this computation later on.

I did not find yet an elegant solution to compute the platform max
number/size of doorbells (besides what I did in the past which was
dependent on the group/device current topology).

Maybe an option would be to have the relevant MSI controllers
registering their doorbells in a global list at probe time and then we
would enumerate all of them. I was reluctant to add this new
functionality in the series at this stage, hence the current simplification.
> 
>> +
>> +	return 0;
>> +}
>> +
>>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>>  				   unsigned int cmd, unsigned long arg)
>>  {
>> @@ -1155,6 +1199,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>  		}
>>  	} else if (cmd == VFIO_IOMMU_GET_INFO) {
>>  		struct vfio_iommu_type1_info info;
>> +		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
>> +		int ret;
>>  
>>  		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
>>  
>> @@ -1168,6 +1214,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>  
>>  		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>>  
>> +		ret = compute_msi_geometry_caps(iommu, &caps);
>> +		if (ret)
>> +			return ret;
>> +
>> +		if (caps.size) {
>> +			info.flags |= VFIO_IOMMU_INFO_CAPS;
>> +			if (info.argsz < sizeof(info) + caps.size) {
>> +				info.argsz = sizeof(info) + caps.size;
>> +				info.cap_offset = 0;
>> +			} else {
>> +				vfio_info_cap_shift(&caps, sizeof(info));
>> +				if (copy_to_user((void __user *)arg +
>> +						sizeof(info), caps.buf,
>> +						caps.size)) {
>> +					kfree(caps.buf);
>> +					return -EFAULT;
>> +				}
>> +				info.cap_offset = sizeof(info);
>> +			}
>> +
>> +			kfree(caps.buf);
>> +		}
>> +
>>  		return copy_to_user((void __user *)arg, &info, minsz) ?
>>  			-EFAULT : 0;
>>  
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 4a9dbc2..0ff6a8d 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -488,7 +488,33 @@ struct vfio_iommu_type1_info {
>>  	__u32	argsz;
>>  	__u32	flags;
>>  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
>> -	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
>> +#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
>> +	__u32   cap_offset;	/* Offset within info struct of first cap */
>> +	__u64	iova_pgsizes;	/* Bitmap of supported page sizes */
> 
> This would break existing users, we can't arbitrarily change the offset
> of iova_pgsizes.  We can add cap_offset to the end and I think
> everything would work about above if we do that.
Hum yes, sorry for the lack of care.
> 
>> +};
>> +
>> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY	1
>> +
>> +/*
>> + * The MSI geometry capability allows to report the MSI IOVA geometry:
>> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
>> + *   whose boundaries are given by [@aperture_start, @aperture_end].
>> + *   this is typically the case on x86 host. The userspace is not allowed
>> + *   to map userspace memory at IOVAs intersecting this range using
>> + *   VFIO_IOMMU_MAP_DMA.
>> + * - or the MSI IOVAs are not requested to belong to any reserved range;
>> + *   in that case the userspace must provide an IOVA window characterized by
>> + *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
>> + */
>> +struct vfio_iommu_type1_info_cap_msi_geometry {
>> +	struct vfio_info_cap_header header;
>> +	bool reserved; /* Are MSI IOVAs within a reserved aperture? */
> 
> Do bools have a guaranteed user size?  Let's make this a __u32 and call
> it flags with bit 0 defined as reserved.  I'm tempted to suggest we
> could figure out how to make alignment fit in another __u32 so we have a
> properly packed structure, otherwise we should make a reserved __u32.
OK will rewrite & check that.

Thanks

Eric
> 
>> +	/* reserved */
>> +	__u64 aperture_start;
>> +	__u64 aperture_end;
>> +	/* not reserved */
>> +	__u64 size; /* IOVA aperture size in bytes the userspace must provide */
>> +	__u64 alignment; /* alignment of the window, in bytes */
>>  };
>>  
>>  #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
>> @@ -503,6 +529,8 @@ struct vfio_iommu_type1_info {
>>   * IOVA region that will be used on some platforms to map the host MSI frames.
>>   * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>>   * IOVA region stays until the container is closed.
>> + * The requirement for provisioning such reserved IOVA range can be checked by
>> + * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
>>   */
>>  struct vfio_iommu_type1_dma_map {
>>  	__u32	argsz;
>
Eric Auger May 10, 2016, 4:50 p.m. UTC | #5
On 05/10/2016 01:03 AM, Alex Williamson wrote:
> On Wed, 4 May 2016 14:06:19 +0200
> Eric Auger <eric.auger@linaro.org> wrote:
> 
>> Hi Alex,
>> On 05/04/2016 01:54 PM, Eric Auger wrote:
>>> This patch allows the user-space to retrieve the MSI geometry. The
>>> implementation is based on capability chains, now also added to
>>> VFIO_IOMMU_GET_INFO.  
>>
>> If you prefer we could consider this patch outside of the main series
>> since it brings extra functionalities (MSI geometry reporting). In a
>> first QEMU integration we would live without knowing the MSI geometry I
>> think, all the more so I currently report an arbitrary number of
>> requested IOVA pages. The computation of the exact number of doorbells
>> to map brings extra complexity and I did not address this issue yet.
>>
>> It sketches a possible user API to report the MSI geometry based on the
>> capability chains, as you suggested some time ago. I am currently busy
>> drafting a QEMU integration.
> 
> How would the user know that reserved MSI mappings are requires or
> available without this?  Wouldn't the only option be for userspace to
> try to map something with the reserved MSI flag set and see if the
> kernel accepts it?
Well my first guess was that the (QEMU) virt machine using KVM/PCIe-MSI
passthrough could hardcode an arbitrary "large" iova size (currently 16
64kB pages in my QEMU integration). In case this is not sufficient for
mapping all host doorbells, we would see MSI allocation failing. In case
the need shows up, we could increase the value later on.
  That's not a very desirable programming model.  The
> arbitrary size is pretty ugly, but it at least makes for a consistent
> user interface.  Is it a functional issue if we overestimate the size
> or is it just a matter of wasting IOVA space?  Is there significant
> harm in making it obscenely large, like 1MB?
no just waste of IOVA space. To make it as transparent as possible for
virt machine I wanted to hide in the platform bus reserved IOVA.
  The reference counting and
> re-use of IOVA pages seems like we may often only be using a single
> IOVA page for multiple doorbells.  I guess I'm leaning towards defining
> the API even if the value is somewhat arbitrary because we'd rather have
> control of this rather than having the user guess and try to rope them
> back in later to use a kernel recommended value.  Thanks,

OK

Thanks

Eric
> 
> Alex
>
diff mbox

Patch

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2fc8197..841360b 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1134,6 +1134,50 @@  static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
 	return ret;
 }
 
+static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
+				     struct vfio_info_cap *caps)
+{
+	struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
+	struct iommu_domain_msi_geometry msi_geometry;
+	struct vfio_info_cap_header *header;
+	struct vfio_domain *d;
+	bool mapping_required;
+	size_t size;
+
+	mutex_lock(&iommu->lock);
+	/* All domains have same require_msi_map property, pick first */
+	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
+	iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
+			      &msi_geometry);
+	mapping_required = msi_geometry.iommu_msi_supported;
+
+	mutex_unlock(&iommu->lock);
+
+	size = sizeof(*vfio_msi_geometry);
+	header = vfio_info_cap_add(caps, size,
+				   VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
+
+	if (IS_ERR(header))
+		return PTR_ERR(header);
+
+	vfio_msi_geometry = container_of(header,
+				struct vfio_iommu_type1_info_cap_msi_geometry,
+				header);
+
+	vfio_msi_geometry->reserved = !mapping_required;
+	if (vfio_msi_geometry->reserved) {
+		vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
+		vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;
+		return 0;
+	}
+
+	vfio_msi_geometry->alignment = 1 << __ffs(vfio_pgsize_bitmap(iommu));
+	/* we currently report the need for an arbitray number of 16 pages */
+	vfio_msi_geometry->size = 16 * vfio_msi_geometry->alignment;
+
+	return 0;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
 				   unsigned int cmd, unsigned long arg)
 {
@@ -1155,6 +1199,8 @@  static long vfio_iommu_type1_ioctl(void *iommu_data,
 		}
 	} else if (cmd == VFIO_IOMMU_GET_INFO) {
 		struct vfio_iommu_type1_info info;
+		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+		int ret;
 
 		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
 
@@ -1168,6 +1214,29 @@  static long vfio_iommu_type1_ioctl(void *iommu_data,
 
 		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
 
+		ret = compute_msi_geometry_caps(iommu, &caps);
+		if (ret)
+			return ret;
+
+		if (caps.size) {
+			info.flags |= VFIO_IOMMU_INFO_CAPS;
+			if (info.argsz < sizeof(info) + caps.size) {
+				info.argsz = sizeof(info) + caps.size;
+				info.cap_offset = 0;
+			} else {
+				vfio_info_cap_shift(&caps, sizeof(info));
+				if (copy_to_user((void __user *)arg +
+						sizeof(info), caps.buf,
+						caps.size)) {
+					kfree(caps.buf);
+					return -EFAULT;
+				}
+				info.cap_offset = sizeof(info);
+			}
+
+			kfree(caps.buf);
+		}
+
 		return copy_to_user((void __user *)arg, &info, minsz) ?
 			-EFAULT : 0;
 
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 4a9dbc2..0ff6a8d 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -488,7 +488,33 @@  struct vfio_iommu_type1_info {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
-	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
+#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
+	__u32   cap_offset;	/* Offset within info struct of first cap */
+	__u64	iova_pgsizes;	/* Bitmap of supported page sizes */
+};
+
+#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY	1
+
+/*
+ * The MSI geometry capability allows to report the MSI IOVA geometry:
+ * - either the MSI IOVAs are constrained within a reserved IOVA aperture
+ *   whose boundaries are given by [@aperture_start, @aperture_end].
+ *   this is typically the case on x86 host. The userspace is not allowed
+ *   to map userspace memory at IOVAs intersecting this range using
+ *   VFIO_IOMMU_MAP_DMA.
+ * - or the MSI IOVAs are not requested to belong to any reserved range;
+ *   in that case the userspace must provide an IOVA window characterized by
+ *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
+ */
+struct vfio_iommu_type1_info_cap_msi_geometry {
+	struct vfio_info_cap_header header;
+	bool reserved; /* Are MSI IOVAs within a reserved aperture? */
+	/* reserved */
+	__u64 aperture_start;
+	__u64 aperture_end;
+	/* not reserved */
+	__u64 size; /* IOVA aperture size in bytes the userspace must provide */
+	__u64 alignment; /* alignment of the window, in bytes */
 };
 
 #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
@@ -503,6 +529,8 @@  struct vfio_iommu_type1_info {
  * IOVA region that will be used on some platforms to map the host MSI frames.
  * In that specific case, vaddr is ignored. Once registered, an MSI reserved
  * IOVA region stays until the container is closed.
+ * The requirement for provisioning such reserved IOVA range can be checked by
+ * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
  */
 struct vfio_iommu_type1_dma_map {
 	__u32	argsz;