Message ID | 1584880325-10561-6-git-send-email-yi.l.liu@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vfio: expose virtual Shared Virtual Addressing to VMs | expand |
Hi Yi, Thank you for the patch! Yet something to improve: [auto build test ERROR on vfio/next] [also build test ERROR on v5.6-rc6 next-20200320] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Liu-Yi-L/vfio-expose-virtual-Shared-Virtual-Addressing-to-VMs/20200322-213259 base: https://github.com/awilliam/linux-vfio.git next config: arm64-defconfig (attached as .config) compiler: aarch64-linux-gcc (GCC) 9.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=9.2.0 make.cross ARCH=arm64 If you fix the issue, kindly add following tag Reported-by: kbuild test robot <lkp@intel.com> All errors (new ones prefixed by >>): drivers/vfio/vfio_iommu_type1.c: In function 'vfio_iommu_get_stage1_format': >> drivers/vfio/vfio_iommu_type1.c:2273:4: error: 'DOMAIN_ATTR_PASID_FORMAT' undeclared (first use in this function) 2273 | DOMAIN_ATTR_PASID_FORMAT, &format)) { | ^~~~~~~~~~~~~~~~~~~~~~~~ drivers/vfio/vfio_iommu_type1.c:2273:4: note: each undeclared identifier is reported only once for each function it appears in drivers/vfio/vfio_iommu_type1.c: In function 'vfio_iommu_type1_ioctl': drivers/vfio/vfio_iommu_type1.c:2355:11: error: implicit declaration of function 'iommu_get_uapi_version' [-Werror=implicit-function-declaration] 2355 | return iommu_get_uapi_version(); | ^~~~~~~~~~~~~~~~~~~~~~ cc1: some warnings being treated as errors vim +/DOMAIN_ATTR_PASID_FORMAT +2273 drivers/vfio/vfio_iommu_type1.c 2257 2258 static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu, 2259 u32 *stage1_format) 2260 { 2261 struct vfio_domain *domain; 2262 u32 format = 0, tmp_format = 0; 2263 int ret; 2264 2265 mutex_lock(&iommu->lock); 2266 if (list_empty(&iommu->domain_list)) { 2267 mutex_unlock(&iommu->lock); 2268 return -EINVAL; 2269 } 2270 2271 list_for_each_entry(domain, &iommu->domain_list, next) { 2272 if (iommu_domain_get_attr(domain->domain, > 2273 DOMAIN_ATTR_PASID_FORMAT, &format)) { 2274 ret = -EINVAL; 2275 format = 0; 2276 goto out_unlock; 2277 } 2278 /* 2279 * format is always non-zero (the first format is 2280 * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). For 2281 * the reason of potential different backed IOMMU 2282 * formats, here we expect to have identical formats 2283 * in the domain list, no mixed formats support. 2284 * return -EINVAL to fail the attempt of setup 2285 * VFIO_TYPE1_NESTING_IOMMU if non-identical formats 2286 * are detected. 2287 */ 2288 if (tmp_format && tmp_format != format) { 2289 ret = -EINVAL; 2290 format = 0; 2291 goto out_unlock; 2292 } 2293 2294 tmp_format = format; 2295 } 2296 ret = 0; 2297 2298 out_unlock: 2299 if (format) 2300 *stage1_format = format; 2301 mutex_unlock(&iommu->lock); 2302 return ret; 2303 } 2304 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
> From: Liu, Yi L <yi.l.liu@intel.com> > Sent: Sunday, March 22, 2020 8:32 PM > > From: Liu Yi L <yi.l.liu@intel.com> > > VFIO exposes IOMMU nesting translation (a.k.a dual stage translation) > capability to userspace. Thus applications like QEMU could support > vIOMMU with hardware's nesting translation capability for pass-through > devices. Before setting up nesting translation for pass-through devices, > QEMU and other applications need to learn the supported 1st-lvl/stage-1 > translation structure format like page table format. > > Take vSVA (virtual Shared Virtual Addressing) as an example, to support > vSVA for pass-through devices, QEMU setup nesting translation for pass- > through devices. The guest page table are configured to host as 1st-lvl/ > stage-1 page table. Therefore, guest format should be compatible with > host side. > > This patch reports the supported 1st-lvl/stage-1 page table format on the > current platform to userspace. QEMU and other alike applications should > use this format info when trying to setup IOMMU nesting translation on > host IOMMU. > > Cc: Kevin Tian <kevin.tian@intel.com> > CC: Jacob Pan <jacob.jun.pan@linux.intel.com> > Cc: Alex Williamson <alex.williamson@redhat.com> > Cc: Eric Auger <eric.auger@redhat.com> > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com> > --- > drivers/vfio/vfio_iommu_type1.c | 56 > +++++++++++++++++++++++++++++++++++++++++ > include/uapi/linux/vfio.h | 1 + > 2 files changed, 57 insertions(+) > > diff --git a/drivers/vfio/vfio_iommu_type1.c > b/drivers/vfio/vfio_iommu_type1.c > index 9aa2a67..82a9e0b 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -2234,11 +2234,66 @@ static int vfio_iommu_type1_pasid_free(struct > vfio_iommu *iommu, > return ret; > } > > +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu, > + u32 *stage1_format) > +{ > + struct vfio_domain *domain; > + u32 format = 0, tmp_format = 0; > + int ret; > + > + mutex_lock(&iommu->lock); > + if (list_empty(&iommu->domain_list)) { > + mutex_unlock(&iommu->lock); > + return -EINVAL; > + } > + > + list_for_each_entry(domain, &iommu->domain_list, next) { > + if (iommu_domain_get_attr(domain->domain, > + DOMAIN_ATTR_PASID_FORMAT, &format)) { > + ret = -EINVAL; > + format = 0; > + goto out_unlock; > + } > + /* > + * format is always non-zero (the first format is > + * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). For > + * the reason of potential different backed IOMMU > + * formats, here we expect to have identical formats > + * in the domain list, no mixed formats support. > + * return -EINVAL to fail the attempt of setup > + * VFIO_TYPE1_NESTING_IOMMU if non-identical formats > + * are detected. > + */ > + if (tmp_format && tmp_format != format) { > + ret = -EINVAL; > + format = 0; > + goto out_unlock; > + } > + > + tmp_format = format; > + } this path is invoked only in VFIO_IOMMU_GET_INFO path. If we don't want to assume the status quo that one container holds only one device w/ vIOMMU (the prerequisite for vSVA), looks we also need check the format compatibility when attaching a new group to this container? > + ret = 0; > + > +out_unlock: > + if (format) > + *stage1_format = format; > + mutex_unlock(&iommu->lock); > + return ret; > +} > + > static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu, > struct vfio_info_cap *caps) > { > struct vfio_info_cap_header *header; > struct vfio_iommu_type1_info_cap_nesting *nesting_cap; > + u32 formats = 0; > + int ret; > + > + ret = vfio_iommu_get_stage1_format(iommu, &formats); > + if (ret) { > + pr_warn("Failed to get stage-1 format\n"); > + return ret; > + } > > header = vfio_info_cap_add(caps, sizeof(*nesting_cap), > VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, > 1); > @@ -2254,6 +2309,7 @@ static int vfio_iommu_info_add_nesting_cap(struct > vfio_iommu *iommu, > /* nesting iommu type supports PASID requests (alloc/free) > */ > nesting_cap->nesting_capabilities |= > VFIO_IOMMU_PASID_REQS; > } > + nesting_cap->stage1_formats = formats; > > return 0; > } > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > index ed9881d..ebeaf3e 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -763,6 +763,7 @@ struct vfio_iommu_type1_info_cap_nesting { > struct vfio_info_cap_header header; > #define VFIO_IOMMU_PASID_REQS (1 << 0) > __u32 nesting_capabilities; > + __u32 stage1_formats; do you plan to support multiple formats? If not, use singular name. > }; > > #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12) > -- > 2.7.4
> From: Tian, Kevin <kevin.tian@intel.com> > Sent: Monday, March 30, 2020 7:49 PM > To: Liu, Yi L <yi.l.liu@intel.com>; alex.williamson@redhat.com; > Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to > userspace > > > From: Liu, Yi L <yi.l.liu@intel.com> > > Sent: Sunday, March 22, 2020 8:32 PM > > > > From: Liu Yi L <yi.l.liu@intel.com> > > > > VFIO exposes IOMMU nesting translation (a.k.a dual stage translation) > > capability to userspace. Thus applications like QEMU could support > > vIOMMU with hardware's nesting translation capability for pass-through > > devices. Before setting up nesting translation for pass-through > > devices, QEMU and other applications need to learn the supported > > 1st-lvl/stage-1 translation structure format like page table format. > > > > Take vSVA (virtual Shared Virtual Addressing) as an example, to > > support vSVA for pass-through devices, QEMU setup nesting translation > > for pass- through devices. The guest page table are configured to host > > as 1st-lvl/ > > stage-1 page table. Therefore, guest format should be compatible with > > host side. > > > > This patch reports the supported 1st-lvl/stage-1 page table format on > > the current platform to userspace. QEMU and other alike applications > > should use this format info when trying to setup IOMMU nesting > > translation on host IOMMU. > > > > Cc: Kevin Tian <kevin.tian@intel.com> > > CC: Jacob Pan <jacob.jun.pan@linux.intel.com> > > Cc: Alex Williamson <alex.williamson@redhat.com> > > Cc: Eric Auger <eric.auger@redhat.com> > > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com> > > --- > > drivers/vfio/vfio_iommu_type1.c | 56 > > +++++++++++++++++++++++++++++++++++++++++ > > include/uapi/linux/vfio.h | 1 + > > 2 files changed, 57 insertions(+) > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c > > b/drivers/vfio/vfio_iommu_type1.c index 9aa2a67..82a9e0b 100644 > > --- a/drivers/vfio/vfio_iommu_type1.c > > +++ b/drivers/vfio/vfio_iommu_type1.c > > @@ -2234,11 +2234,66 @@ static int vfio_iommu_type1_pasid_free(struct > > vfio_iommu *iommu, > > return ret; > > } > > > > +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu, > > + u32 *stage1_format) > > +{ > > + struct vfio_domain *domain; > > + u32 format = 0, tmp_format = 0; > > + int ret; > > + > > + mutex_lock(&iommu->lock); > > + if (list_empty(&iommu->domain_list)) { > > + mutex_unlock(&iommu->lock); > > + return -EINVAL; > > + } > > + > > + list_for_each_entry(domain, &iommu->domain_list, next) { > > + if (iommu_domain_get_attr(domain->domain, > > + DOMAIN_ATTR_PASID_FORMAT, &format)) { > > + ret = -EINVAL; > > + format = 0; > > + goto out_unlock; > > + } > > + /* > > + * format is always non-zero (the first format is > > + * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). For > > + * the reason of potential different backed IOMMU > > + * formats, here we expect to have identical formats > > + * in the domain list, no mixed formats support. > > + * return -EINVAL to fail the attempt of setup > > + * VFIO_TYPE1_NESTING_IOMMU if non-identical formats > > + * are detected. > > + */ > > + if (tmp_format && tmp_format != format) { > > + ret = -EINVAL; > > + format = 0; > > + goto out_unlock; > > + } > > + > > + tmp_format = format; > > + } > > this path is invoked only in VFIO_IOMMU_GET_INFO path. If we don't want to > assume the status quo that one container holds only one device w/ vIOMMU > (the prerequisite for vSVA), looks we also need check the format > compatibility when attaching a new group to this container? right. if attaching to a nesting type container (vfio_iommu.nesting bit indicates it), it should check if it is compabile with prior domains in the domain list. But if it is the first one attached to this container, it's fine. is it good? > > + ret = 0; > > + > > +out_unlock: > > + if (format) > > + *stage1_format = format; > > + mutex_unlock(&iommu->lock); > > + return ret; > > +} > > + > > static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu, > > struct vfio_info_cap *caps) > > { > > struct vfio_info_cap_header *header; > > struct vfio_iommu_type1_info_cap_nesting *nesting_cap; > > + u32 formats = 0; > > + int ret; > > + > > + ret = vfio_iommu_get_stage1_format(iommu, &formats); > > + if (ret) { > > + pr_warn("Failed to get stage-1 format\n"); > > + return ret; > > + } > > > > header = vfio_info_cap_add(caps, sizeof(*nesting_cap), > > VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, > > 1); > > @@ -2254,6 +2309,7 @@ static int > > vfio_iommu_info_add_nesting_cap(struct > > vfio_iommu *iommu, > > /* nesting iommu type supports PASID requests (alloc/free) */ > > nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS; > > } > > + nesting_cap->stage1_formats = formats; > > > > return 0; > > } > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > > index ed9881d..ebeaf3e 100644 > > --- a/include/uapi/linux/vfio.h > > +++ b/include/uapi/linux/vfio.h > > @@ -763,6 +763,7 @@ struct vfio_iommu_type1_info_cap_nesting { > > struct vfio_info_cap_header header; > > #define VFIO_IOMMU_PASID_REQS (1 << 0) > > __u32 nesting_capabilities; > > + __u32 stage1_formats; > > do you plan to support multiple formats? If not, use singular name. I do have such plan. e.g. it may be helpful when one day a platform can support multiple formats. Regards, Yi Liu
> From: Liu, Yi L <yi.l.liu@intel.com> > Sent: Wednesday, April 1, 2020 3:38 PM > > > From: Tian, Kevin <kevin.tian@intel.com> > > Sent: Monday, March 30, 2020 7:49 PM > > To: Liu, Yi L <yi.l.liu@intel.com>; alex.williamson@redhat.com; > > Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to > > userspace > > > > > From: Liu, Yi L <yi.l.liu@intel.com> > > > Sent: Sunday, March 22, 2020 8:32 PM > > > > > > From: Liu Yi L <yi.l.liu@intel.com> > > > > > > VFIO exposes IOMMU nesting translation (a.k.a dual stage translation) > > > capability to userspace. Thus applications like QEMU could support > > > vIOMMU with hardware's nesting translation capability for pass-through > > > devices. Before setting up nesting translation for pass-through > > > devices, QEMU and other applications need to learn the supported > > > 1st-lvl/stage-1 translation structure format like page table format. > > > > > > Take vSVA (virtual Shared Virtual Addressing) as an example, to > > > support vSVA for pass-through devices, QEMU setup nesting translation > > > for pass- through devices. The guest page table are configured to host > > > as 1st-lvl/ > > > stage-1 page table. Therefore, guest format should be compatible with > > > host side. > > > > > > This patch reports the supported 1st-lvl/stage-1 page table format on > > > the current platform to userspace. QEMU and other alike applications > > > should use this format info when trying to setup IOMMU nesting > > > translation on host IOMMU. > > > > > > Cc: Kevin Tian <kevin.tian@intel.com> > > > CC: Jacob Pan <jacob.jun.pan@linux.intel.com> > > > Cc: Alex Williamson <alex.williamson@redhat.com> > > > Cc: Eric Auger <eric.auger@redhat.com> > > > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> > > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com> > > > --- > > > drivers/vfio/vfio_iommu_type1.c | 56 > > > +++++++++++++++++++++++++++++++++++++++++ > > > include/uapi/linux/vfio.h | 1 + > > > 2 files changed, 57 insertions(+) > > > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c > > > b/drivers/vfio/vfio_iommu_type1.c index 9aa2a67..82a9e0b 100644 > > > --- a/drivers/vfio/vfio_iommu_type1.c > > > +++ b/drivers/vfio/vfio_iommu_type1.c > > > @@ -2234,11 +2234,66 @@ static int > vfio_iommu_type1_pasid_free(struct > > > vfio_iommu *iommu, > > > return ret; > > > } > > > > > > +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu, > > > + u32 *stage1_format) > > > +{ > > > + struct vfio_domain *domain; > > > + u32 format = 0, tmp_format = 0; > > > + int ret; > > > + > > > + mutex_lock(&iommu->lock); > > > + if (list_empty(&iommu->domain_list)) { > > > + mutex_unlock(&iommu->lock); > > > + return -EINVAL; > > > + } > > > + > > > + list_for_each_entry(domain, &iommu->domain_list, next) { > > > + if (iommu_domain_get_attr(domain->domain, > > > + DOMAIN_ATTR_PASID_FORMAT, &format)) { > > > + ret = -EINVAL; > > > + format = 0; > > > + goto out_unlock; > > > + } > > > + /* > > > + * format is always non-zero (the first format is > > > + * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). For > > > + * the reason of potential different backed IOMMU > > > + * formats, here we expect to have identical formats > > > + * in the domain list, no mixed formats support. > > > + * return -EINVAL to fail the attempt of setup > > > + * VFIO_TYPE1_NESTING_IOMMU if non-identical formats > > > + * are detected. > > > + */ > > > + if (tmp_format && tmp_format != format) { > > > + ret = -EINVAL; > > > + format = 0; > > > + goto out_unlock; > > > + } > > > + > > > + tmp_format = format; > > > + } > > > > this path is invoked only in VFIO_IOMMU_GET_INFO path. If we don't want > to > > assume the status quo that one container holds only one device w/ > vIOMMU > > (the prerequisite for vSVA), looks we also need check the format > > compatibility when attaching a new group to this container? > > right. if attaching to a nesting type container (vfio_iommu.nesting bit > indicates it), it should check if it is compabile with prior domains in > the domain list. But if it is the first one attached to this container, > it's fine. is it good? yes, but my point is whether we should check the format compatibility in the attach path... > > > > + ret = 0; > > > + > > > +out_unlock: > > > + if (format) > > > + *stage1_format = format; > > > + mutex_unlock(&iommu->lock); > > > + return ret; > > > +} > > > + > > > static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu, > > > struct vfio_info_cap *caps) > > > { > > > struct vfio_info_cap_header *header; > > > struct vfio_iommu_type1_info_cap_nesting *nesting_cap; > > > + u32 formats = 0; > > > + int ret; > > > + > > > + ret = vfio_iommu_get_stage1_format(iommu, &formats); > > > + if (ret) { > > > + pr_warn("Failed to get stage-1 format\n"); > > > + return ret; > > > + } > > > > > > header = vfio_info_cap_add(caps, sizeof(*nesting_cap), > > > VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, > > > 1); > > > @@ -2254,6 +2309,7 @@ static int > > > vfio_iommu_info_add_nesting_cap(struct > > > vfio_iommu *iommu, > > > /* nesting iommu type supports PASID requests (alloc/free) > */ > > > nesting_cap->nesting_capabilities |= > VFIO_IOMMU_PASID_REQS; > > > } > > > + nesting_cap->stage1_formats = formats; > > > > > > return 0; > > > } > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > > > index ed9881d..ebeaf3e 100644 > > > --- a/include/uapi/linux/vfio.h > > > +++ b/include/uapi/linux/vfio.h > > > @@ -763,6 +763,7 @@ struct vfio_iommu_type1_info_cap_nesting { > > > struct vfio_info_cap_header header; > > > #define VFIO_IOMMU_PASID_REQS (1 << 0) > > > __u32 nesting_capabilities; > > > + __u32 stage1_formats; > > > > do you plan to support multiple formats? If not, use singular name. > > I do have such plan. e.g. it may be helpful when one day a platform can > support multiple formats. > > Regards, > Yi Liu
> From: Tian, Kevin <kevin.tian@intel.com> > Sent: Wednesday, April 1, 2020 3:56 PM > To: Liu, Yi L <yi.l.liu@intel.com>; alex.williamson@redhat.com; > Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to > userspace > > > From: Liu, Yi L <yi.l.liu@intel.com> > > Sent: Wednesday, April 1, 2020 3:38 PM > > > > > From: Tian, Kevin <kevin.tian@intel.com> > > > Sent: Monday, March 30, 2020 7:49 PM > > > To: Liu, Yi L <yi.l.liu@intel.com>; alex.williamson@redhat.com; > > > Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 > > > format to userspace > > > > > > > From: Liu, Yi L <yi.l.liu@intel.com> > > > > Sent: Sunday, March 22, 2020 8:32 PM > > > > > > > > From: Liu Yi L <yi.l.liu@intel.com> > > > > > > > > VFIO exposes IOMMU nesting translation (a.k.a dual stage > > > > translation) capability to userspace. Thus applications like QEMU > > > > could support vIOMMU with hardware's nesting translation > > > > capability for pass-through devices. Before setting up nesting > > > > translation for pass-through devices, QEMU and other applications > > > > need to learn the supported > > > > 1st-lvl/stage-1 translation structure format like page table format. > > > > > > > > Take vSVA (virtual Shared Virtual Addressing) as an example, to > > > > support vSVA for pass-through devices, QEMU setup nesting > > > > translation for pass- through devices. The guest page table are > > > > configured to host as 1st-lvl/ > > > > stage-1 page table. Therefore, guest format should be compatible > > > > with host side. > > > > > > > > This patch reports the supported 1st-lvl/stage-1 page table format > > > > on the current platform to userspace. QEMU and other alike > > > > applications should use this format info when trying to setup > > > > IOMMU nesting translation on host IOMMU. > > > > > > > > Cc: Kevin Tian <kevin.tian@intel.com> > > > > CC: Jacob Pan <jacob.jun.pan@linux.intel.com> > > > > Cc: Alex Williamson <alex.williamson@redhat.com> > > > > Cc: Eric Auger <eric.auger@redhat.com> > > > > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> > > > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com> > > > > --- > > > > drivers/vfio/vfio_iommu_type1.c | 56 > > > > +++++++++++++++++++++++++++++++++++++++++ > > > > include/uapi/linux/vfio.h | 1 + > > > > 2 files changed, 57 insertions(+) > > > > > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c > > > > b/drivers/vfio/vfio_iommu_type1.c index 9aa2a67..82a9e0b 100644 > > > > --- a/drivers/vfio/vfio_iommu_type1.c > > > > +++ b/drivers/vfio/vfio_iommu_type1.c > > > > @@ -2234,11 +2234,66 @@ static int > > vfio_iommu_type1_pasid_free(struct > > > > vfio_iommu *iommu, > > > > return ret; > > > > } > > > > > > > > +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu, > > > > + u32 *stage1_format) > > > > +{ > > > > + struct vfio_domain *domain; > > > > + u32 format = 0, tmp_format = 0; > > > > + int ret; > > > > + > > > > + mutex_lock(&iommu->lock); > > > > + if (list_empty(&iommu->domain_list)) { > > > > + mutex_unlock(&iommu->lock); > > > > + return -EINVAL; > > > > + } > > > > + > > > > + list_for_each_entry(domain, &iommu->domain_list, next) { > > > > + if (iommu_domain_get_attr(domain->domain, > > > > + DOMAIN_ATTR_PASID_FORMAT, &format)) { > > > > + ret = -EINVAL; > > > > + format = 0; > > > > + goto out_unlock; > > > > + } > > > > + /* > > > > + * format is always non-zero (the first format is > > > > + * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). For > > > > + * the reason of potential different backed IOMMU > > > > + * formats, here we expect to have identical formats > > > > + * in the domain list, no mixed formats support. > > > > + * return -EINVAL to fail the attempt of setup > > > > + * VFIO_TYPE1_NESTING_IOMMU if non-identical formats > > > > + * are detected. > > > > + */ > > > > + if (tmp_format && tmp_format != format) { > > > > + ret = -EINVAL; > > > > + format = 0; > > > > + goto out_unlock; > > > > + } > > > > + > > > > + tmp_format = format; > > > > + } > > > > > > this path is invoked only in VFIO_IOMMU_GET_INFO path. If we don't > > > want > > to > > > assume the status quo that one container holds only one device w/ > > vIOMMU > > > (the prerequisite for vSVA), looks we also need check the format > > > compatibility when attaching a new group to this container? > > > > right. if attaching to a nesting type container (vfio_iommu.nesting > > bit indicates it), it should check if it is compabile with prior > > domains in the domain list. But if it is the first one attached to > > this container, it's fine. is it good? > > yes, but my point is whether we should check the format compatibility > in the attach path... I guess so. Assume a device has been attached to a container, and userspace has fetched the nesting cap info. e.g. QEMU will have a per-container structure to store the nesting info. And then attach another device from a separate group, if its backend iommu supports different formats, then it will be a problem. If userspace reads the nesting cap info again, it will get a different value. It may affect the prior attched device. If userspace doesn't refresh the nesting info by re-fetch, then the newly added device may use a format which its backend iommu doesn't support. Although, the vendor specific iommu driver should ensure all devices are backed by iommu units w/ same capability (e.g. format). But it would better to have a check in vfio side all the same. how about your opinion so far?:-) Regards, Yi Liu
> From: Liu, Yi L <yi.l.liu@intel.com> > Sent: Wednesday, April 1, 2020 4:07 PM > > > From: Tian, Kevin <kevin.tian@intel.com> > > Sent: Wednesday, April 1, 2020 3:56 PM > > To: Liu, Yi L <yi.l.liu@intel.com>; alex.williamson@redhat.com; > > Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to > > userspace > > > > > From: Liu, Yi L <yi.l.liu@intel.com> > > > Sent: Wednesday, April 1, 2020 3:38 PM > > > > > > > From: Tian, Kevin <kevin.tian@intel.com> > > > > Sent: Monday, March 30, 2020 7:49 PM > > > > To: Liu, Yi L <yi.l.liu@intel.com>; alex.williamson@redhat.com; > > > > Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 > > > > format to userspace > > > > > > > > > From: Liu, Yi L <yi.l.liu@intel.com> > > > > > Sent: Sunday, March 22, 2020 8:32 PM > > > > > > > > > > From: Liu Yi L <yi.l.liu@intel.com> > > > > > > > > > > VFIO exposes IOMMU nesting translation (a.k.a dual stage > > > > > translation) capability to userspace. Thus applications like QEMU > > > > > could support vIOMMU with hardware's nesting translation > > > > > capability for pass-through devices. Before setting up nesting > > > > > translation for pass-through devices, QEMU and other applications > > > > > need to learn the supported > > > > > 1st-lvl/stage-1 translation structure format like page table format. > > > > > > > > > > Take vSVA (virtual Shared Virtual Addressing) as an example, to > > > > > support vSVA for pass-through devices, QEMU setup nesting > > > > > translation for pass- through devices. The guest page table are > > > > > configured to host as 1st-lvl/ > > > > > stage-1 page table. Therefore, guest format should be compatible > > > > > with host side. > > > > > > > > > > This patch reports the supported 1st-lvl/stage-1 page table format > > > > > on the current platform to userspace. QEMU and other alike > > > > > applications should use this format info when trying to setup > > > > > IOMMU nesting translation on host IOMMU. > > > > > > > > > > Cc: Kevin Tian <kevin.tian@intel.com> > > > > > CC: Jacob Pan <jacob.jun.pan@linux.intel.com> > > > > > Cc: Alex Williamson <alex.williamson@redhat.com> > > > > > Cc: Eric Auger <eric.auger@redhat.com> > > > > > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> > > > > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com> > > > > > --- > > > > > drivers/vfio/vfio_iommu_type1.c | 56 > > > > > +++++++++++++++++++++++++++++++++++++++++ > > > > > include/uapi/linux/vfio.h | 1 + > > > > > 2 files changed, 57 insertions(+) > > > > > > > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c > > > > > b/drivers/vfio/vfio_iommu_type1.c index 9aa2a67..82a9e0b 100644 > > > > > --- a/drivers/vfio/vfio_iommu_type1.c > > > > > +++ b/drivers/vfio/vfio_iommu_type1.c > > > > > @@ -2234,11 +2234,66 @@ static int > > > vfio_iommu_type1_pasid_free(struct > > > > > vfio_iommu *iommu, > > > > > return ret; > > > > > } > > > > > > > > > > +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu, > > > > > + u32 *stage1_format) > > > > > +{ > > > > > + struct vfio_domain *domain; > > > > > + u32 format = 0, tmp_format = 0; > > > > > + int ret; > > > > > + > > > > > + mutex_lock(&iommu->lock); > > > > > + if (list_empty(&iommu->domain_list)) { > > > > > + mutex_unlock(&iommu->lock); > > > > > + return -EINVAL; > > > > > + } > > > > > + > > > > > + list_for_each_entry(domain, &iommu->domain_list, next) { > > > > > + if (iommu_domain_get_attr(domain->domain, > > > > > + DOMAIN_ATTR_PASID_FORMAT, &format)) { > > > > > + ret = -EINVAL; > > > > > + format = 0; > > > > > + goto out_unlock; > > > > > + } > > > > > + /* > > > > > + * format is always non-zero (the first format is > > > > > + * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). > For > > > > > + * the reason of potential different backed IOMMU > > > > > + * formats, here we expect to have identical formats > > > > > + * in the domain list, no mixed formats support. > > > > > + * return -EINVAL to fail the attempt of setup > > > > > + * VFIO_TYPE1_NESTING_IOMMU if non-identical > formats > > > > > + * are detected. > > > > > + */ > > > > > + if (tmp_format && tmp_format != format) { > > > > > + ret = -EINVAL; > > > > > + format = 0; > > > > > + goto out_unlock; > > > > > + } > > > > > + > > > > > + tmp_format = format; > > > > > + } > > > > > > > > this path is invoked only in VFIO_IOMMU_GET_INFO path. If we don't > > > > want > > > to > > > > assume the status quo that one container holds only one device w/ > > > vIOMMU > > > > (the prerequisite for vSVA), looks we also need check the format > > > > compatibility when attaching a new group to this container? > > > > > > right. if attaching to a nesting type container (vfio_iommu.nesting > > > bit indicates it), it should check if it is compabile with prior > > > domains in the domain list. But if it is the first one attached to > > > this container, it's fine. is it good? > > > > yes, but my point is whether we should check the format compatibility > > in the attach path... > > I guess so. Assume a device has been attached to a container, and > userspace has fetched the nesting cap info. e.g. QEMU will have a > per-container structure to store the nesting info. And then attach > another device from a separate group, if its backend iommu supports > different formats, then it will be a problem. If userspace reads the > nesting cap info again, it will get a different value. It may affect > the prior attched device. If userspace doesn't refresh the nesting > info by re-fetch, then the newly added device may use a format which > its backend iommu doesn't support. > > Although, the vendor specific iommu driver should ensure all devices > are backed by iommu units w/ same capability (e.g. format). But it > would better to have a check in vfio side all the same. how about your > opinion so far?:-) > I think so.
> From: Tian, Kevin <kevin.tian@intel.com> > Sent: Wednesday, April 1, 2020 4:09 PM > Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to > userspace > > > From: Liu, Yi L <yi.l.liu@intel.com> > > Sent: Wednesday, April 1, 2020 4:07 PM > > > > > From: Tian, Kevin <kevin.tian@intel.com> > > > Sent: Wednesday, April 1, 2020 3:56 PM > > > To: Liu, Yi L <yi.l.liu@intel.com>; alex.williamson@redhat.com; > > > Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 > > > format to userspace > > > > > > > From: Liu, Yi L <yi.l.liu@intel.com> > > > > Sent: Wednesday, April 1, 2020 3:38 PM > > > > > > > > > From: Tian, Kevin <kevin.tian@intel.com> > > > > > Sent: Monday, March 30, 2020 7:49 PM > > > > > To: Liu, Yi L <yi.l.liu@intel.com>; alex.williamson@redhat.com; > > > > > Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 > > > > > format to userspace > > > > > > > > > > > From: Liu, Yi L <yi.l.liu@intel.com> > > > > > > Sent: Sunday, March 22, 2020 8:32 PM > > > > > > > > > > > > From: Liu Yi L <yi.l.liu@intel.com> > > > > > > > > > > > > VFIO exposes IOMMU nesting translation (a.k.a dual stage > > > > > > translation) capability to userspace. Thus applications like > > > > > > QEMU could support vIOMMU with hardware's nesting translation > > > > > > capability for pass-through devices. Before setting up nesting > > > > > > translation for pass-through devices, QEMU and other > > > > > > applications need to learn the supported > > > > > > 1st-lvl/stage-1 translation structure format like page table format. > > > > > > > > > > > > Take vSVA (virtual Shared Virtual Addressing) as an example, > > > > > > to support vSVA for pass-through devices, QEMU setup nesting > > > > > > translation for pass- through devices. The guest page table > > > > > > are configured to host as 1st-lvl/ > > > > > > stage-1 page table. Therefore, guest format should be > > > > > > compatible with host side. > > > > > > > > > > > > This patch reports the supported 1st-lvl/stage-1 page table > > > > > > format on the current platform to userspace. QEMU and other > > > > > > alike applications should use this format info when trying to > > > > > > setup IOMMU nesting translation on host IOMMU. > > > > > > > > > > > > Cc: Kevin Tian <kevin.tian@intel.com> > > > > > > CC: Jacob Pan <jacob.jun.pan@linux.intel.com> > > > > > > Cc: Alex Williamson <alex.williamson@redhat.com> > > > > > > Cc: Eric Auger <eric.auger@redhat.com> > > > > > > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> > > > > > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com> > > > > > > --- > > > > > > drivers/vfio/vfio_iommu_type1.c | 56 > > > > > > +++++++++++++++++++++++++++++++++++++++++ > > > > > > include/uapi/linux/vfio.h | 1 + > > > > > > 2 files changed, 57 insertions(+) > > > > > > > > > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c > > > > > > b/drivers/vfio/vfio_iommu_type1.c index 9aa2a67..82a9e0b > > > > > > 100644 > > > > > > --- a/drivers/vfio/vfio_iommu_type1.c > > > > > > +++ b/drivers/vfio/vfio_iommu_type1.c > > > > > > @@ -2234,11 +2234,66 @@ static int > > > > vfio_iommu_type1_pasid_free(struct > > > > > > vfio_iommu *iommu, > > > > > > return ret; > > > > > > } > > > > > > > > > > > > +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu, > > > > > > + u32 *stage1_format) > > > > > > +{ > > > > > > + struct vfio_domain *domain; > > > > > > + u32 format = 0, tmp_format = 0; > > > > > > + int ret; > > > > > > + > > > > > > + mutex_lock(&iommu->lock); > > > > > > + if (list_empty(&iommu->domain_list)) { > > > > > > + mutex_unlock(&iommu->lock); > > > > > > + return -EINVAL; > > > > > > + } > > > > > > + > > > > > > + list_for_each_entry(domain, &iommu->domain_list, next) { > > > > > > + if (iommu_domain_get_attr(domain->domain, > > > > > > + DOMAIN_ATTR_PASID_FORMAT, &format)) { > > > > > > + ret = -EINVAL; > > > > > > + format = 0; > > > > > > + goto out_unlock; > > > > > > + } > > > > > > + /* > > > > > > + * format is always non-zero (the first format is > > > > > > + * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). > > For > > > > > > + * the reason of potential different backed IOMMU > > > > > > + * formats, here we expect to have identical formats > > > > > > + * in the domain list, no mixed formats support. > > > > > > + * return -EINVAL to fail the attempt of setup > > > > > > + * VFIO_TYPE1_NESTING_IOMMU if non-identical > > formats > > > > > > + * are detected. > > > > > > + */ > > > > > > + if (tmp_format && tmp_format != format) { > > > > > > + ret = -EINVAL; > > > > > > + format = 0; > > > > > > + goto out_unlock; > > > > > > + } > > > > > > + > > > > > > + tmp_format = format; > > > > > > + } > > > > > > > > > > this path is invoked only in VFIO_IOMMU_GET_INFO path. If we > > > > > don't want > > > > to > > > > > assume the status quo that one container holds only one device > > > > > w/ > > > > vIOMMU > > > > > (the prerequisite for vSVA), looks we also need check the format > > > > > compatibility when attaching a new group to this container? > > > > > > > > right. if attaching to a nesting type container > > > > (vfio_iommu.nesting bit indicates it), it should check if it is > > > > compabile with prior domains in the domain list. But if it is the > > > > first one attached to this container, it's fine. is it good? > > > > > > yes, but my point is whether we should check the format > > > compatibility in the attach path... > > > > I guess so. Assume a device has been attached to a container, and > > userspace has fetched the nesting cap info. e.g. QEMU will have a > > per-container structure to store the nesting info. And then attach > > another device from a separate group, if its backend iommu supports > > different formats, then it will be a problem. If userspace reads the > > nesting cap info again, it will get a different value. It may affect > > the prior attched device. If userspace doesn't refresh the nesting > > info by re-fetch, then the newly added device may use a format which > > its backend iommu doesn't support. > > > > Although, the vendor specific iommu driver should ensure all devices > > are backed by iommu units w/ same capability (e.g. format). But it > > would better to have a check in vfio side all the same. how about your > > opinion so far?:-) > > > > I think so. Thanks, :-)
Hi Yi, On 3/22/20 1:32 PM, Liu, Yi L wrote: > From: Liu Yi L <yi.l.liu@intel.com> > > VFIO exposes IOMMU nesting translation (a.k.a dual stage translation) > capability to userspace. Thus applications like QEMU could support > vIOMMU with hardware's nesting translation capability for pass-through > devices. Before setting up nesting translation for pass-through devices, > QEMU and other applications need to learn the supported 1st-lvl/stage-1 > translation structure format like page table format. > > Take vSVA (virtual Shared Virtual Addressing) as an example, to support > vSVA for pass-through devices, QEMU setup nesting translation for pass- > through devices. The guest page table are configured to host as 1st-lvl/ > stage-1 page table. Therefore, guest format should be compatible with > host side. > > This patch reports the supported 1st-lvl/stage-1 page table format on the > current platform to userspace. QEMU and other alike applications should > use this format info when trying to setup IOMMU nesting translation on > host IOMMU. > > Cc: Kevin Tian <kevin.tian@intel.com> > CC: Jacob Pan <jacob.jun.pan@linux.intel.com> > Cc: Alex Williamson <alex.williamson@redhat.com> > Cc: Eric Auger <eric.auger@redhat.com> > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com> > --- > drivers/vfio/vfio_iommu_type1.c | 56 +++++++++++++++++++++++++++++++++++++++++ > include/uapi/linux/vfio.h | 1 + > 2 files changed, 57 insertions(+) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 9aa2a67..82a9e0b 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -2234,11 +2234,66 @@ static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu, > return ret; > } > > +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu, > + u32 *stage1_format) vfio_pasid_format() to be homogeneous with vfio_pgsize_bitmap() which does the same kind of enumeration of the vfio_iommu domains > +{ > + struct vfio_domain *domain; > + u32 format = 0, tmp_format = 0; > + int ret; ret = -EINVAL; > + > + mutex_lock(&iommu->lock); > + if (list_empty(&iommu->domain_list)) { goto out_unlock; > + mutex_unlock(&iommu->lock); > + return -EINVAL; > + } > + > + list_for_each_entry(domain, &iommu->domain_list, next) { > + if (iommu_domain_get_attr(domain->domain, > + DOMAIN_ATTR_PASID_FORMAT, &format)) { I can find DOMAIN_ATTR_PASID_FORMAT in Jacob's v9 but not in v10 > + ret = -EINVAL; could be removed > + format = 0; > + goto out_unlock; > + } > + /* > + * format is always non-zero (the first format is > + * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). For > + * the reason of potential different backed IOMMU > + * formats, here we expect to have identical formats > + * in the domain list, no mixed formats support. > + * return -EINVAL to fail the attempt of setup > + * VFIO_TYPE1_NESTING_IOMMU if non-identical formats > + * are detected. > + */ > + if (tmp_format && tmp_format != format) { > + ret = -EINVAL; could be removed > + format = 0; > + goto out_unlock; > + } > + > + tmp_format = format; > + } > + ret = 0; > + > +out_unlock: > + if (format) if (!ret) ? then you can remove the format = 0 in case of error. > + *stage1_format = format; > + mutex_unlock(&iommu->lock); > + return ret; > +} > + > static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu, > struct vfio_info_cap *caps) > { > struct vfio_info_cap_header *header; > struct vfio_iommu_type1_info_cap_nesting *nesting_cap; > + u32 formats = 0; > + int ret; > + > + ret = vfio_iommu_get_stage1_format(iommu, &formats); > + if (ret) { > + pr_warn("Failed to get stage-1 format\n"); trace triggered by userspace to be removed? > + return ret; > + } > > header = vfio_info_cap_add(caps, sizeof(*nesting_cap), > VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1); > @@ -2254,6 +2309,7 @@ static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu, > /* nesting iommu type supports PASID requests (alloc/free) */ > nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS; What is the meaning for ARM? > } > + nesting_cap->stage1_formats = formats; as spotted by Kevin, since a single format is supported, rename > > return 0; > } > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > index ed9881d..ebeaf3e 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -763,6 +763,7 @@ struct vfio_iommu_type1_info_cap_nesting { > struct vfio_info_cap_header header; > #define VFIO_IOMMU_PASID_REQS (1 << 0) > __u32 nesting_capabilities; > + __u32 stage1_formats; > }; > > #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12) > Thanks Eric
Hi Eric, > From: Auger Eric <eric.auger@redhat.com> > Sent: Wednesday, April 1, 2020 4:51 PM > To: Liu, Yi L <yi.l.liu@intel.com>; alex.williamson@redhat.com > Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to > userspace > > Hi Yi, > On 3/22/20 1:32 PM, Liu, Yi L wrote: > > From: Liu Yi L <yi.l.liu@intel.com> > > > > VFIO exposes IOMMU nesting translation (a.k.a dual stage translation) > > capability to userspace. Thus applications like QEMU could support > > vIOMMU with hardware's nesting translation capability for pass-through > > devices. Before setting up nesting translation for pass-through devices, > > QEMU and other applications need to learn the supported 1st-lvl/stage-1 > > translation structure format like page table format. > > > > Take vSVA (virtual Shared Virtual Addressing) as an example, to support > > vSVA for pass-through devices, QEMU setup nesting translation for pass- > > through devices. The guest page table are configured to host as 1st-lvl/ > > stage-1 page table. Therefore, guest format should be compatible with > > host side. > > > > This patch reports the supported 1st-lvl/stage-1 page table format on the > > current platform to userspace. QEMU and other alike applications should > > use this format info when trying to setup IOMMU nesting translation on > > host IOMMU. > > > > Cc: Kevin Tian <kevin.tian@intel.com> > > CC: Jacob Pan <jacob.jun.pan@linux.intel.com> > > Cc: Alex Williamson <alex.williamson@redhat.com> > > Cc: Eric Auger <eric.auger@redhat.com> > > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com> > > --- > > drivers/vfio/vfio_iommu_type1.c | 56 > +++++++++++++++++++++++++++++++++++++++++ > > include/uapi/linux/vfio.h | 1 + > > 2 files changed, 57 insertions(+) > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > > index 9aa2a67..82a9e0b 100644 > > --- a/drivers/vfio/vfio_iommu_type1.c > > +++ b/drivers/vfio/vfio_iommu_type1.c > > @@ -2234,11 +2234,66 @@ static int vfio_iommu_type1_pasid_free(struct > vfio_iommu *iommu, > > return ret; > > } > > > > +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu, > > + u32 *stage1_format) > vfio_pasid_format() to be homogeneous with vfio_pgsize_bitmap() which > does the same kind of enumeration of the vfio_iommu domains yes, similar. > > +{ > > + struct vfio_domain *domain; > > + u32 format = 0, tmp_format = 0; > > + int ret; > ret = -EINVAL; got it. > > + > > + mutex_lock(&iommu->lock); > > + if (list_empty(&iommu->domain_list)) { > goto out_unlock; right. > > + mutex_unlock(&iommu->lock); > > + return -EINVAL; > > + } > > + > > + list_for_each_entry(domain, &iommu->domain_list, next) { > > + if (iommu_domain_get_attr(domain->domain, > > + DOMAIN_ATTR_PASID_FORMAT, &format)) { > I can find DOMAIN_ATTR_PASID_FORMAT in Jacob's v9 but not in v10 oops, I guess he somehow missed. you may find it in below link. https://github.com/luxis1999/linux-vsva/commit/bf14b11a12f74d58ad3ee626a5d891de395082eb > > + ret = -EINVAL; > could be removed sure. > > + format = 0; > > + goto out_unlock; > > + } > > + /* > > + * format is always non-zero (the first format is > > + * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). For > > + * the reason of potential different backed IOMMU > > + * formats, here we expect to have identical formats > > + * in the domain list, no mixed formats support. > > + * return -EINVAL to fail the attempt of setup > > + * VFIO_TYPE1_NESTING_IOMMU if non-identical formats > > + * are detected. > > + */ > > + if (tmp_format && tmp_format != format) { > > + ret = -EINVAL; > could be removed got it. > > + format = 0; > > + goto out_unlock; > > + } > > + > > + tmp_format = format; > > + } > > + ret = 0; > > + > > +out_unlock: > > + if (format) > if (!ret) ? then you can remove the format = 0 in case of error. oh, yes. > > + *stage1_format = format; > > + mutex_unlock(&iommu->lock); > > + return ret; > > +} > > + > > static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu, > > struct vfio_info_cap *caps) > > { > > struct vfio_info_cap_header *header; > > struct vfio_iommu_type1_info_cap_nesting *nesting_cap; > > + u32 formats = 0; > > + int ret; > > + > > + ret = vfio_iommu_get_stage1_format(iommu, &formats); > > + if (ret) { > > + pr_warn("Failed to get stage-1 format\n"); > trace triggered by userspace to be removed? sure. > > + return ret; > > + } > > > > header = vfio_info_cap_add(caps, sizeof(*nesting_cap), > > VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1); > > @@ -2254,6 +2309,7 @@ static int vfio_iommu_info_add_nesting_cap(struct > vfio_iommu *iommu, > > /* nesting iommu type supports PASID requests (alloc/free) */ > > nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS; > What is the meaning for ARM? I think it's just a software capability exposed to userspace, on userspace side, it has a choice to use it or not. :-) The reason define it and report it in cap nesting is that I'd like to make the pasid alloc/free be available just for IOMMU with type VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not good for ARM. We can find a proper way to report the availability. > > } > > + nesting_cap->stage1_formats = formats; > as spotted by Kevin, since a single format is supported, rename ok, I was believing it may be possible on ARM or so. :-) will rename it. I'll refine the patch per your above comments. Regards, Yi Liu
Hi Yi, On 4/1/20 2:51 PM, Liu, Yi L wrote: > Hi Eric, > >> From: Auger Eric <eric.auger@redhat.com> >> Sent: Wednesday, April 1, 2020 4:51 PM >> To: Liu, Yi L <yi.l.liu@intel.com>; alex.williamson@redhat.com >> Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to >> userspace >> >> Hi Yi, >> On 3/22/20 1:32 PM, Liu, Yi L wrote: >>> From: Liu Yi L <yi.l.liu@intel.com> >>> >>> VFIO exposes IOMMU nesting translation (a.k.a dual stage translation) >>> capability to userspace. Thus applications like QEMU could support >>> vIOMMU with hardware's nesting translation capability for pass-through >>> devices. Before setting up nesting translation for pass-through devices, >>> QEMU and other applications need to learn the supported 1st-lvl/stage-1 >>> translation structure format like page table format. >>> >>> Take vSVA (virtual Shared Virtual Addressing) as an example, to support >>> vSVA for pass-through devices, QEMU setup nesting translation for pass- >>> through devices. The guest page table are configured to host as 1st-lvl/ >>> stage-1 page table. Therefore, guest format should be compatible with >>> host side. >>> >>> This patch reports the supported 1st-lvl/stage-1 page table format on the >>> current platform to userspace. QEMU and other alike applications should >>> use this format info when trying to setup IOMMU nesting translation on >>> host IOMMU. >>> >>> Cc: Kevin Tian <kevin.tian@intel.com> >>> CC: Jacob Pan <jacob.jun.pan@linux.intel.com> >>> Cc: Alex Williamson <alex.williamson@redhat.com> >>> Cc: Eric Auger <eric.auger@redhat.com> >>> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> >>> Signed-off-by: Liu Yi L <yi.l.liu@intel.com> >>> --- >>> drivers/vfio/vfio_iommu_type1.c | 56 >> +++++++++++++++++++++++++++++++++++++++++ >>> include/uapi/linux/vfio.h | 1 + >>> 2 files changed, 57 insertions(+) >>> >>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c >>> index 9aa2a67..82a9e0b 100644 >>> --- a/drivers/vfio/vfio_iommu_type1.c >>> +++ b/drivers/vfio/vfio_iommu_type1.c >>> @@ -2234,11 +2234,66 @@ static int vfio_iommu_type1_pasid_free(struct >> vfio_iommu *iommu, >>> return ret; >>> } >>> >>> +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu, >>> + u32 *stage1_format) >> vfio_pasid_format() to be homogeneous with vfio_pgsize_bitmap() which >> does the same kind of enumeration of the vfio_iommu domains > > yes, similar. > >>> +{ >>> + struct vfio_domain *domain; >>> + u32 format = 0, tmp_format = 0; >>> + int ret; >> ret = -EINVAL; > > got it. > >>> + >>> + mutex_lock(&iommu->lock); >>> + if (list_empty(&iommu->domain_list)) { >> goto out_unlock; > > right. >>> + mutex_unlock(&iommu->lock); >>> + return -EINVAL; >>> + } >>> + >>> + list_for_each_entry(domain, &iommu->domain_list, next) { >>> + if (iommu_domain_get_attr(domain->domain, >>> + DOMAIN_ATTR_PASID_FORMAT, &format)) { >> I can find DOMAIN_ATTR_PASID_FORMAT in Jacob's v9 but not in v10 > > oops, I guess he somehow missed. you may find it in below link. > > https://github.com/luxis1999/linux-vsva/commit/bf14b11a12f74d58ad3ee626a5d891de395082eb > >>> + ret = -EINVAL; >> could be removed > > sure. > >>> + format = 0; >>> + goto out_unlock; >>> + } >>> + /* >>> + * format is always non-zero (the first format is >>> + * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). For >>> + * the reason of potential different backed IOMMU >>> + * formats, here we expect to have identical formats >>> + * in the domain list, no mixed formats support. >>> + * return -EINVAL to fail the attempt of setup >>> + * VFIO_TYPE1_NESTING_IOMMU if non-identical formats >>> + * are detected. >>> + */ >>> + if (tmp_format && tmp_format != format) { >>> + ret = -EINVAL; >> could be removed > > got it. > >>> + format = 0; >>> + goto out_unlock; >>> + } >>> + >>> + tmp_format = format; >>> + } >>> + ret = 0; >>> + >>> +out_unlock: >>> + if (format) >> if (!ret) ? then you can remove the format = 0 in case of error. > > oh, yes. > >>> + *stage1_format = format; >>> + mutex_unlock(&iommu->lock); >>> + return ret; >>> +} >>> + >>> static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu, >>> struct vfio_info_cap *caps) >>> { >>> struct vfio_info_cap_header *header; >>> struct vfio_iommu_type1_info_cap_nesting *nesting_cap; >>> + u32 formats = 0; >>> + int ret; >>> + >>> + ret = vfio_iommu_get_stage1_format(iommu, &formats); >>> + if (ret) { >>> + pr_warn("Failed to get stage-1 format\n"); >> trace triggered by userspace to be removed? > > sure. > >>> + return ret; >>> + } >>> >>> header = vfio_info_cap_add(caps, sizeof(*nesting_cap), >>> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1); >>> @@ -2254,6 +2309,7 @@ static int vfio_iommu_info_add_nesting_cap(struct >> vfio_iommu *iommu, >>> /* nesting iommu type supports PASID requests (alloc/free) */ >>> nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS; >> What is the meaning for ARM? > > I think it's just a software capability exposed to userspace, on > userspace side, it has a choice to use it or not. :-) The reason > define it and report it in cap nesting is that I'd like to make > the pasid alloc/free be available just for IOMMU with type > VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not > good for ARM. We can find a proper way to report the availability. Well it is more a question for jean-Philippe. Do we have a system wide PASID allocation on ARM? Thanks Eric > >>> } >>> + nesting_cap->stage1_formats = formats; >> as spotted by Kevin, since a single format is supported, rename > > ok, I was believing it may be possible on ARM or so. :-) will > rename it. > > I'll refine the patch per your above comments. > > Regards, > Yi Liu >
On Sun, 22 Mar 2020 05:32:02 -0700 "Liu, Yi L" <yi.l.liu@intel.com> wrote: > From: Liu Yi L <yi.l.liu@intel.com> > > VFIO exposes IOMMU nesting translation (a.k.a dual stage translation) > capability to userspace. Thus applications like QEMU could support > vIOMMU with hardware's nesting translation capability for pass-through > devices. Before setting up nesting translation for pass-through devices, > QEMU and other applications need to learn the supported 1st-lvl/stage-1 > translation structure format like page table format. > > Take vSVA (virtual Shared Virtual Addressing) as an example, to support > vSVA for pass-through devices, QEMU setup nesting translation for pass- > through devices. The guest page table are configured to host as 1st-lvl/ > stage-1 page table. Therefore, guest format should be compatible with > host side. > > This patch reports the supported 1st-lvl/stage-1 page table format on the > current platform to userspace. QEMU and other alike applications should > use this format info when trying to setup IOMMU nesting translation on > host IOMMU. > > Cc: Kevin Tian <kevin.tian@intel.com> > CC: Jacob Pan <jacob.jun.pan@linux.intel.com> > Cc: Alex Williamson <alex.williamson@redhat.com> > Cc: Eric Auger <eric.auger@redhat.com> > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com> > --- > drivers/vfio/vfio_iommu_type1.c | 56 +++++++++++++++++++++++++++++++++++++++++ > include/uapi/linux/vfio.h | 1 + > 2 files changed, 57 insertions(+) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 9aa2a67..82a9e0b 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -2234,11 +2234,66 @@ static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu, > return ret; > } > > +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu, > + u32 *stage1_format) > +{ > + struct vfio_domain *domain; > + u32 format = 0, tmp_format = 0; > + int ret; > + > + mutex_lock(&iommu->lock); > + if (list_empty(&iommu->domain_list)) { > + mutex_unlock(&iommu->lock); > + return -EINVAL; > + } > + > + list_for_each_entry(domain, &iommu->domain_list, next) { > + if (iommu_domain_get_attr(domain->domain, > + DOMAIN_ATTR_PASID_FORMAT, &format)) { > + ret = -EINVAL; > + format = 0; > + goto out_unlock; > + } > + /* > + * format is always non-zero (the first format is > + * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). For > + * the reason of potential different backed IOMMU > + * formats, here we expect to have identical formats > + * in the domain list, no mixed formats support. > + * return -EINVAL to fail the attempt of setup > + * VFIO_TYPE1_NESTING_IOMMU if non-identical formats > + * are detected. > + */ > + if (tmp_format && tmp_format != format) { > + ret = -EINVAL; > + format = 0; > + goto out_unlock; > + } > + > + tmp_format = format; > + } > + ret = 0; > + > +out_unlock: > + if (format) > + *stage1_format = format; > + mutex_unlock(&iommu->lock); > + return ret; > +} > + > static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu, > struct vfio_info_cap *caps) > { > struct vfio_info_cap_header *header; > struct vfio_iommu_type1_info_cap_nesting *nesting_cap; > + u32 formats = 0; > + int ret; > + > + ret = vfio_iommu_get_stage1_format(iommu, &formats); > + if (ret) { > + pr_warn("Failed to get stage-1 format\n"); > + return ret; Looks like this generates a warning and causes the iommu_get_info ioctl to fail if the hardware doesn't support the pasid format attribute, or the domain list is empty. This breaks users on existing hardware. > + } > > header = vfio_info_cap_add(caps, sizeof(*nesting_cap), > VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1); > @@ -2254,6 +2309,7 @@ static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu, > /* nesting iommu type supports PASID requests (alloc/free) */ > nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS; > } > + nesting_cap->stage1_formats = formats; > > return 0; > } > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > index ed9881d..ebeaf3e 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -763,6 +763,7 @@ struct vfio_iommu_type1_info_cap_nesting { > struct vfio_info_cap_header header; > #define VFIO_IOMMU_PASID_REQS (1 << 0) > __u32 nesting_capabilities; > + __u32 stage1_formats; > }; > > #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote: > >>> header = vfio_info_cap_add(caps, sizeof(*nesting_cap), > >>> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1); > >>> @@ -2254,6 +2309,7 @@ static int vfio_iommu_info_add_nesting_cap(struct > >> vfio_iommu *iommu, > >>> /* nesting iommu type supports PASID requests (alloc/free) */ > >>> nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS; > >> What is the meaning for ARM? > > > > I think it's just a software capability exposed to userspace, on > > userspace side, it has a choice to use it or not. :-) The reason > > define it and report it in cap nesting is that I'd like to make > > the pasid alloc/free be available just for IOMMU with type > > VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not > > good for ARM. We can find a proper way to report the availability. > > Well it is more a question for jean-Philippe. Do we have a system wide > PASID allocation on ARM? We don't, the PASID spaces are per-VM on Arm, so this function should consult the IOMMU driver before setting flags. As you said on patch 3, nested doesn't necessarily imply PASID support. The SMMUv2 does not support PASID but does support nesting stages 1 and 2 for the IOVA space. SMMUv3 support of PASID depends on HW capabilities. So I think this needs to be finer grained: Does the container support: * VFIO_IOMMU_PASID_REQUEST? -> Yes for VT-d 3 -> No for Arm SMMU * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL? -> Yes for VT-d 3 -> Sometimes for SMMUv2 -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due to PASID tables being in GPA space.) * VFIO_IOMMU_BIND_PASID_TABLE? -> No for VT-d -> Sometimes for SMMUv3 Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support. > >>> + nesting_cap->stage1_formats = formats; > >> as spotted by Kevin, since a single format is supported, rename > > > > ok, I was believing it may be possible on ARM or so. :-) will > > rename it. Yes I don't think an u32 is going to cut it for Arm :( We need to describe all sorts of capabilities for page and PASID tables (granules, GPA size, ASID/PASID size, HW access/dirty, etc etc.) Just saying "Arm stage-1 format" wouldn't mean much. I guess we could have a secondary vendor capability for these? Thanks, Jean
Hi Alex, > From: Alex Williamson <alex.williamson@redhat.com> > Sent: Friday, April 3, 2020 3:20 AM > To: Liu, Yi L <yi.l.liu@intel.com> > Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to > userspace > > On Sun, 22 Mar 2020 05:32:02 -0700 > "Liu, Yi L" <yi.l.liu@intel.com> wrote: > > > From: Liu Yi L <yi.l.liu@intel.com> > > > > VFIO exposes IOMMU nesting translation (a.k.a dual stage translation) > > capability to userspace. Thus applications like QEMU could support > > vIOMMU with hardware's nesting translation capability for pass-through > > devices. Before setting up nesting translation for pass-through devices, > > QEMU and other applications need to learn the supported 1st-lvl/stage-1 > > translation structure format like page table format. > > > > Take vSVA (virtual Shared Virtual Addressing) as an example, to support > > vSVA for pass-through devices, QEMU setup nesting translation for pass- > > through devices. The guest page table are configured to host as 1st-lvl/ > > stage-1 page table. Therefore, guest format should be compatible with > > host side. > > > > This patch reports the supported 1st-lvl/stage-1 page table format on the > > current platform to userspace. QEMU and other alike applications should > > use this format info when trying to setup IOMMU nesting translation on > > host IOMMU. > > > > Cc: Kevin Tian <kevin.tian@intel.com> > > CC: Jacob Pan <jacob.jun.pan@linux.intel.com> > > Cc: Alex Williamson <alex.williamson@redhat.com> > > Cc: Eric Auger <eric.auger@redhat.com> > > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com> > > --- > > drivers/vfio/vfio_iommu_type1.c | 56 > +++++++++++++++++++++++++++++++++++++++++ > > include/uapi/linux/vfio.h | 1 + > > 2 files changed, 57 insertions(+) > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > > index 9aa2a67..82a9e0b 100644 > > --- a/drivers/vfio/vfio_iommu_type1.c > > +++ b/drivers/vfio/vfio_iommu_type1.c > > @@ -2234,11 +2234,66 @@ static int vfio_iommu_type1_pasid_free(struct > vfio_iommu *iommu, > > return ret; > > } > > > > +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu, > > + u32 *stage1_format) > > +{ > > + struct vfio_domain *domain; > > + u32 format = 0, tmp_format = 0; > > + int ret; > > + > > + mutex_lock(&iommu->lock); > > + if (list_empty(&iommu->domain_list)) { > > + mutex_unlock(&iommu->lock); > > + return -EINVAL; > > + } > > + > > + list_for_each_entry(domain, &iommu->domain_list, next) { > > + if (iommu_domain_get_attr(domain->domain, > > + DOMAIN_ATTR_PASID_FORMAT, &format)) { > > + ret = -EINVAL; > > + format = 0; > > + goto out_unlock; > > + } > > + /* > > + * format is always non-zero (the first format is > > + * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). For > > + * the reason of potential different backed IOMMU > > + * formats, here we expect to have identical formats > > + * in the domain list, no mixed formats support. > > + * return -EINVAL to fail the attempt of setup > > + * VFIO_TYPE1_NESTING_IOMMU if non-identical formats > > + * are detected. > > + */ > > + if (tmp_format && tmp_format != format) { > > + ret = -EINVAL; > > + format = 0; > > + goto out_unlock; > > + } > > + > > + tmp_format = format; > > + } > > + ret = 0; > > + > > +out_unlock: > > + if (format) > > + *stage1_format = format; > > + mutex_unlock(&iommu->lock); > > + return ret; > > +} > > + > > static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu, > > struct vfio_info_cap *caps) > > { > > struct vfio_info_cap_header *header; > > struct vfio_iommu_type1_info_cap_nesting *nesting_cap; > > + u32 formats = 0; > > + int ret; > > + > > + ret = vfio_iommu_get_stage1_format(iommu, &formats); > > + if (ret) { > > + pr_warn("Failed to get stage-1 format\n"); > > + return ret; > > Looks like this generates a warning and causes the iommu_get_info ioctl > to fail if the hardware doesn't support the pasid format attribute, or > the domain list is empty. This breaks users on existing hardware. oops, yes, it should not fail anything as it is just an extended feature. let me correct it. Thanks, Yi Liu
Hi Jean, > From: Jean-Philippe Brucker <jean-philippe@linaro.org> > Sent: Friday, April 3, 2020 4:23 PM > To: Auger Eric <eric.auger@redhat.com> > userspace > > On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote: > > >>> header = vfio_info_cap_add(caps, sizeof(*nesting_cap), > > >>> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1); > @@ -2254,6 +2309,7 > > >>> @@ static int vfio_iommu_info_add_nesting_cap(struct > > >> vfio_iommu *iommu, > > >>> /* nesting iommu type supports PASID requests (alloc/free) */ > > >>> nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS; > > >> What is the meaning for ARM? > > > > > > I think it's just a software capability exposed to userspace, on > > > userspace side, it has a choice to use it or not. :-) The reason > > > define it and report it in cap nesting is that I'd like to make the > > > pasid alloc/free be available just for IOMMU with type > > > VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not good > > > for ARM. We can find a proper way to report the availability. > > > > Well it is more a question for jean-Philippe. Do we have a system wide > > PASID allocation on ARM? > > We don't, the PASID spaces are per-VM on Arm, so this function should consult the > IOMMU driver before setting flags. As you said on patch 3, nested doesn't > necessarily imply PASID support. The SMMUv2 does not support PASID but does > support nesting stages 1 and 2 for the IOVA space. > SMMUv3 support of PASID depends on HW capabilities. So I think this needs to be > finer grained: > > Does the container support: > * VFIO_IOMMU_PASID_REQUEST? > -> Yes for VT-d 3 > -> No for Arm SMMU > * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL? > -> Yes for VT-d 3 > -> Sometimes for SMMUv2 > -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due to > PASID tables being in GPA space.) > * VFIO_IOMMU_BIND_PASID_TABLE? > -> No for VT-d > -> Sometimes for SMMUv3 > > Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support. good summary. do you expect to see any > > > >>> + nesting_cap->stage1_formats = formats; > > >> as spotted by Kevin, since a single format is supported, rename > > > > > > ok, I was believing it may be possible on ARM or so. :-) will rename > > > it. > > Yes I don't think an u32 is going to cut it for Arm :( We need to describe all sorts of > capabilities for page and PASID tables (granules, GPA size, ASID/PASID size, HW > access/dirty, etc etc.) Just saying "Arm stage-1 format" wouldn't mean much. I > guess we could have a secondary vendor capability for these? Actually, I'm wondering if we can define some formats to stands for a set of capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may indicates the 1st level page table related caps (aw, a/d, SRE, EA and etc.). And vIOMMU can parse the capabilities. Regards, Yi Liu
> From: Liu, Yi L > Sent: Tuesday, April 7, 2020 5:43 PM > > > We don't, the PASID spaces are per-VM on Arm, so this function should > > consult the IOMMU driver before setting flags. As you said on patch 3, > > nested doesn't necessarily imply PASID support. The SMMUv2 does not > > support PASID but does support nesting stages 1 and 2 for the IOVA space. > > SMMUv3 support of PASID depends on HW capabilities. So I think this > > needs to be finer grained: > > > > Does the container support: > > * VFIO_IOMMU_PASID_REQUEST? > > -> Yes for VT-d 3 > > -> No for Arm SMMU > > * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL? > > -> Yes for VT-d 3 > > -> Sometimes for SMMUv2 > > -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due to > > PASID tables being in GPA space.) > > * VFIO_IOMMU_BIND_PASID_TABLE? > > -> No for VT-d > > -> Sometimes for SMMUv3 > > > > Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support. > > good summary. do you expect to see any please ignore this message. I planned to ask if possible to report VFIO_IOMMU_CACHE_INVALIDATE only (no bind support). But I stopped typing it when I came to believe it's unnecessary to report it if there is no bind support. Regards, Yi Liu
Hi Yi, On 4/7/20 11:43 AM, Liu, Yi L wrote: > Hi Jean, > >> From: Jean-Philippe Brucker <jean-philippe@linaro.org> >> Sent: Friday, April 3, 2020 4:23 PM >> To: Auger Eric <eric.auger@redhat.com> >> userspace >> >> On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote: >>>>>> header = vfio_info_cap_add(caps, sizeof(*nesting_cap), >>>>>> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1); >> @@ -2254,6 +2309,7 >>>>>> @@ static int vfio_iommu_info_add_nesting_cap(struct >>>>> vfio_iommu *iommu, >>>>>> /* nesting iommu type supports PASID requests (alloc/free) */ >>>>>> nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS; >>>>> What is the meaning for ARM? >>>> >>>> I think it's just a software capability exposed to userspace, on >>>> userspace side, it has a choice to use it or not. :-) The reason >>>> define it and report it in cap nesting is that I'd like to make the >>>> pasid alloc/free be available just for IOMMU with type >>>> VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not good >>>> for ARM. We can find a proper way to report the availability. >>> >>> Well it is more a question for jean-Philippe. Do we have a system wide >>> PASID allocation on ARM? >> >> We don't, the PASID spaces are per-VM on Arm, so this function should consult the >> IOMMU driver before setting flags. As you said on patch 3, nested doesn't >> necessarily imply PASID support. The SMMUv2 does not support PASID but does >> support nesting stages 1 and 2 for the IOVA space. >> SMMUv3 support of PASID depends on HW capabilities. So I think this needs to be >> finer grained: >> >> Does the container support: >> * VFIO_IOMMU_PASID_REQUEST? >> -> Yes for VT-d 3 >> -> No for Arm SMMU >> * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL? >> -> Yes for VT-d 3 >> -> Sometimes for SMMUv2 >> -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due to >> PASID tables being in GPA space.) >> * VFIO_IOMMU_BIND_PASID_TABLE? >> -> No for VT-d >> -> Sometimes for SMMUv3 >> >> Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support. > > good summary. do you expect to see any > >> >>>>>> + nesting_cap->stage1_formats = formats; >>>>> as spotted by Kevin, since a single format is supported, rename >>>> >>>> ok, I was believing it may be possible on ARM or so. :-) will rename >>>> it. >> >> Yes I don't think an u32 is going to cut it for Arm :( We need to describe all sorts of >> capabilities for page and PASID tables (granules, GPA size, ASID/PASID size, HW >> access/dirty, etc etc.) Just saying "Arm stage-1 format" wouldn't mean much. I >> guess we could have a secondary vendor capability for these? > > Actually, I'm wondering if we can define some formats to stands for a set of > capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may indicates the 1st level > page table related caps (aw, a/d, SRE, EA and etc.). And vIOMMU can parse > the capabilities. But eventually do we really need all those capability getters? I mean can't we simply rely on the actual call to VFIO_IOMMU_BIND_GUEST_PGTBL() to detect any mismatch? Definitively the error handling may be heavier on userspace but can't we manage. My fear is we end up with an overly complex series. This capability getter may be interesting if we can switch to a fallback implementation but here I guess we don't have any fallback. With smmuv3 nested stage we don't have any fallback solution either. For the versions, it is different because the userspace shall be able to adapt (or not) to the max version supported by the kernel. Thanks Eric > > Regards, > Yi Liu >
On Wed, Apr 08, 2020 at 12:27:58PM +0200, Auger Eric wrote: > Hi Yi, > > On 4/7/20 11:43 AM, Liu, Yi L wrote: > > Hi Jean, > > > >> From: Jean-Philippe Brucker <jean-philippe@linaro.org> > >> Sent: Friday, April 3, 2020 4:23 PM > >> To: Auger Eric <eric.auger@redhat.com> > >> userspace > >> > >> On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote: > >>>>>> header = vfio_info_cap_add(caps, sizeof(*nesting_cap), > >>>>>> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1); > >> @@ -2254,6 +2309,7 > >>>>>> @@ static int vfio_iommu_info_add_nesting_cap(struct > >>>>> vfio_iommu *iommu, > >>>>>> /* nesting iommu type supports PASID requests (alloc/free) */ > >>>>>> nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS; > >>>>> What is the meaning for ARM? > >>>> > >>>> I think it's just a software capability exposed to userspace, on > >>>> userspace side, it has a choice to use it or not. :-) The reason > >>>> define it and report it in cap nesting is that I'd like to make the > >>>> pasid alloc/free be available just for IOMMU with type > >>>> VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not good > >>>> for ARM. We can find a proper way to report the availability. > >>> > >>> Well it is more a question for jean-Philippe. Do we have a system wide > >>> PASID allocation on ARM? > >> > >> We don't, the PASID spaces are per-VM on Arm, so this function should consult the > >> IOMMU driver before setting flags. As you said on patch 3, nested doesn't > >> necessarily imply PASID support. The SMMUv2 does not support PASID but does > >> support nesting stages 1 and 2 for the IOVA space. > >> SMMUv3 support of PASID depends on HW capabilities. So I think this needs to be > >> finer grained: > >> > >> Does the container support: > >> * VFIO_IOMMU_PASID_REQUEST? > >> -> Yes for VT-d 3 > >> -> No for Arm SMMU > >> * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL? > >> -> Yes for VT-d 3 > >> -> Sometimes for SMMUv2 > >> -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due to > >> PASID tables being in GPA space.) > >> * VFIO_IOMMU_BIND_PASID_TABLE? > >> -> No for VT-d > >> -> Sometimes for SMMUv3 > >> > >> Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support. > > > > good summary. do you expect to see any > > > >> > >>>>>> + nesting_cap->stage1_formats = formats; > >>>>> as spotted by Kevin, since a single format is supported, rename > >>>> > >>>> ok, I was believing it may be possible on ARM or so. :-) will rename > >>>> it. > >> > >> Yes I don't think an u32 is going to cut it for Arm :( We need to describe all sorts of > >> capabilities for page and PASID tables (granules, GPA size, ASID/PASID size, HW > >> access/dirty, etc etc.) Just saying "Arm stage-1 format" wouldn't mean much. I > >> guess we could have a secondary vendor capability for these? > > > > Actually, I'm wondering if we can define some formats to stands for a set of > > capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may indicates the 1st level > > page table related caps (aw, a/d, SRE, EA and etc.). And vIOMMU can parse > > the capabilities. > > But eventually do we really need all those capability getters? I mean > can't we simply rely on the actual call to VFIO_IOMMU_BIND_GUEST_PGTBL() > to detect any mismatch? Definitively the error handling may be heavier > on userspace but can't we manage. I think we need to present these capabilities at boot time, long before the guest triggers a bind(). For example if the host SMMU doesn't support 16-bit ASID, we need to communicate that to the guest using vSMMU ID registers or PROBE properties. Otherwise a bind() will succeed, but if the guest uses 16-bit ASIDs in its CD, DMA will result in C_BAD_CD events which we'll inject into the guest, for no apparent reason from their perspective. In addition some VMMs may have fallbacks if shared page tables are not available. They could fall back to a MAP/UNMAP interface, or simply not present a vIOMMU to the guest. Thanks, Jean > My fear is we end up with an overly > complex series. This capability getter may be interesting if we can > switch to a fallback implementation but here I guess we don't have any > fallback. With smmuv3 nested stage we don't have any fallback solution > either. For the versions, it is different because the userspace shall be > able to adapt (or not) to the max version supported by the kernel. > > Thanks > > Eric > > > > Regards, > > Yi Liu > > >
Hi Jean, On 4/9/20 10:14 AM, Jean-Philippe Brucker wrote: > On Wed, Apr 08, 2020 at 12:27:58PM +0200, Auger Eric wrote: >> Hi Yi, >> >> On 4/7/20 11:43 AM, Liu, Yi L wrote: >>> Hi Jean, >>> >>>> From: Jean-Philippe Brucker <jean-philippe@linaro.org> >>>> Sent: Friday, April 3, 2020 4:23 PM >>>> To: Auger Eric <eric.auger@redhat.com> >>>> userspace >>>> >>>> On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote: >>>>>>>> header = vfio_info_cap_add(caps, sizeof(*nesting_cap), >>>>>>>> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1); >>>> @@ -2254,6 +2309,7 >>>>>>>> @@ static int vfio_iommu_info_add_nesting_cap(struct >>>>>>> vfio_iommu *iommu, >>>>>>>> /* nesting iommu type supports PASID requests (alloc/free) */ >>>>>>>> nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS; >>>>>>> What is the meaning for ARM? >>>>>> >>>>>> I think it's just a software capability exposed to userspace, on >>>>>> userspace side, it has a choice to use it or not. :-) The reason >>>>>> define it and report it in cap nesting is that I'd like to make the >>>>>> pasid alloc/free be available just for IOMMU with type >>>>>> VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not good >>>>>> for ARM. We can find a proper way to report the availability. >>>>> >>>>> Well it is more a question for jean-Philippe. Do we have a system wide >>>>> PASID allocation on ARM? >>>> >>>> We don't, the PASID spaces are per-VM on Arm, so this function should consult the >>>> IOMMU driver before setting flags. As you said on patch 3, nested doesn't >>>> necessarily imply PASID support. The SMMUv2 does not support PASID but does >>>> support nesting stages 1 and 2 for the IOVA space. >>>> SMMUv3 support of PASID depends on HW capabilities. So I think this needs to be >>>> finer grained: >>>> >>>> Does the container support: >>>> * VFIO_IOMMU_PASID_REQUEST? >>>> -> Yes for VT-d 3 >>>> -> No for Arm SMMU >>>> * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL? >>>> -> Yes for VT-d 3 >>>> -> Sometimes for SMMUv2 >>>> -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due to >>>> PASID tables being in GPA space.) >>>> * VFIO_IOMMU_BIND_PASID_TABLE? >>>> -> No for VT-d >>>> -> Sometimes for SMMUv3 >>>> >>>> Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support. >>> >>> good summary. do you expect to see any >>> >>>> >>>>>>>> + nesting_cap->stage1_formats = formats; >>>>>>> as spotted by Kevin, since a single format is supported, rename >>>>>> >>>>>> ok, I was believing it may be possible on ARM or so. :-) will rename >>>>>> it. >>>> >>>> Yes I don't think an u32 is going to cut it for Arm :( We need to describe all sorts of >>>> capabilities for page and PASID tables (granules, GPA size, ASID/PASID size, HW >>>> access/dirty, etc etc.) Just saying "Arm stage-1 format" wouldn't mean much. I >>>> guess we could have a secondary vendor capability for these? >>> >>> Actually, I'm wondering if we can define some formats to stands for a set of >>> capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may indicates the 1st level >>> page table related caps (aw, a/d, SRE, EA and etc.). And vIOMMU can parse >>> the capabilities. >> >> But eventually do we really need all those capability getters? I mean >> can't we simply rely on the actual call to VFIO_IOMMU_BIND_GUEST_PGTBL() >> to detect any mismatch? Definitively the error handling may be heavier >> on userspace but can't we manage. > > I think we need to present these capabilities at boot time, long before > the guest triggers a bind(). For example if the host SMMU doesn't support > 16-bit ASID, we need to communicate that to the guest using vSMMU ID > registers or PROBE properties. Otherwise a bind() will succeed, but if the > guest uses 16-bit ASIDs in its CD, DMA will result in C_BAD_CD events > which we'll inject into the guest, for no apparent reason from their > perspective. OK I understand this case as in this situation we may be able to change the way to iommu is exposed to the guest. > > In addition some VMMs may have fallbacks if shared page tables are not > available. They could fall back to a MAP/UNMAP interface, or simply not > present a vIOMMU to the guest. fair enough, there is a need for such capability checker in the mid term. But this patch introduces the capability to check whether system wide PASID alloc is supported and this may not be requested at that stage for the whole vSVM integration? Thanks Eric > > Thanks, > Jean > >> My fear is we end up with an overly >> complex series. This capability getter may be interesting if we can >> switch to a fallback implementation but here I guess we don't have any >> fallback. With smmuv3 nested stage we don't have any fallback solution >> either. For the versions, it is different because the userspace shall be >> able to adapt (or not) to the max version supported by the kernel. >> >> Thanks >> >> Eric >>> >>> Regards, >>> Yi Liu >>> >> >
Hi Jean, > From: Jean-Philippe Brucker <jean-philippe@linaro.org> > Sent: Thursday, April 9, 2020 4:15 PM > Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to > userspace > > On Wed, Apr 08, 2020 at 12:27:58PM +0200, Auger Eric wrote: > > Hi Yi, > > > > On 4/7/20 11:43 AM, Liu, Yi L wrote: > > > Hi Jean, > > > > > >> From: Jean-Philippe Brucker <jean-philippe@linaro.org> > > >> Sent: Friday, April 3, 2020 4:23 PM > > >> To: Auger Eric <eric.auger@redhat.com> > > >> userspace > > >> > > >> On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote: > > >>>>>> header = vfio_info_cap_add(caps, sizeof(*nesting_cap), > > >>>>>> > VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1); > > >> @@ -2254,6 +2309,7 > > >>>>>> @@ static int vfio_iommu_info_add_nesting_cap(struct > > >>>>> vfio_iommu *iommu, > > >>>>>> /* nesting iommu type supports PASID requests (alloc/free) > */ > > >>>>>> nesting_cap->nesting_capabilities |= > VFIO_IOMMU_PASID_REQS; > > >>>>> What is the meaning for ARM? > > >>>> > > >>>> I think it's just a software capability exposed to userspace, on > > >>>> userspace side, it has a choice to use it or not. :-) The reason > > >>>> define it and report it in cap nesting is that I'd like to make the > > >>>> pasid alloc/free be available just for IOMMU with type > > >>>> VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not good > > >>>> for ARM. We can find a proper way to report the availability. > > >>> > > >>> Well it is more a question for jean-Philippe. Do we have a system wide > > >>> PASID allocation on ARM? > > >> > > >> We don't, the PASID spaces are per-VM on Arm, so this function should consult > the > > >> IOMMU driver before setting flags. As you said on patch 3, nested doesn't > > >> necessarily imply PASID support. The SMMUv2 does not support PASID but does > > >> support nesting stages 1 and 2 for the IOVA space. > > >> SMMUv3 support of PASID depends on HW capabilities. So I think this needs to > be > > >> finer grained: > > >> > > >> Does the container support: > > >> * VFIO_IOMMU_PASID_REQUEST? > > >> -> Yes for VT-d 3 > > >> -> No for Arm SMMU > > >> * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL? > > >> -> Yes for VT-d 3 > > >> -> Sometimes for SMMUv2 > > >> -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due to > > >> PASID tables being in GPA space.) > > >> * VFIO_IOMMU_BIND_PASID_TABLE? > > >> -> No for VT-d > > >> -> Sometimes for SMMUv3 > > >> > > >> Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support. > > > > > > good summary. do you expect to see any > > > > > >> > > >>>>>> + nesting_cap->stage1_formats = formats; > > >>>>> as spotted by Kevin, since a single format is supported, rename > > >>>> > > >>>> ok, I was believing it may be possible on ARM or so. :-) will rename > > >>>> it. > > >> > > >> Yes I don't think an u32 is going to cut it for Arm :( We need to > > >> describe all sorts > of > > >> capabilities for page and PASID tables (granules, GPA size, ASID/PASID size, HW > > >> access/dirty, etc etc.) Just saying "Arm stage-1 format" wouldn't mean much. I > > >> guess we could have a secondary vendor capability for these? > > > > > > Actually, I'm wondering if we can define some formats to stands for a set of > > > capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may indicates the 1st level > > > page table related caps (aw, a/d, SRE, EA and etc.). And vIOMMU can parse > > > the capabilities. > > > > But eventually do we really need all those capability getters? I mean > > can't we simply rely on the actual call to VFIO_IOMMU_BIND_GUEST_PGTBL() > > to detect any mismatch? Definitively the error handling may be heavier > > on userspace but can't we manage. > > I think we need to present these capabilities at boot time, long before > the guest triggers a bind(). For example if the host SMMU doesn't support > 16-bit ASID, we need to communicate that to the guest using vSMMU ID > registers or PROBE properties. Otherwise a bind() will succeed, but if the > guest uses 16-bit ASIDs in its CD, DMA will result in C_BAD_CD events > which we'll inject into the guest, for no apparent reason from their > perspective. > > In addition some VMMs may have fallbacks if shared page tables are not > available. They could fall back to a MAP/UNMAP interface, or simply not > present a vIOMMU to the guest. > Based on the comments, I think it would be a need to report iommu caps in detail. So I guess iommu uapi needs to provide something alike vfio cap chain in iommu uapi. Please feel free let me know your thoughts. :-) In vfio, we can define a cap as below: struct vfio_iommu_type1_info_cap_nesting { struct vfio_info_cap_header header; __u64 iommu_model; #define VFIO_IOMMU_PASID_REQS (1 << 0) #define VFIO_IOMMU_BIND_GPASID (1 << 1) #define VFIO_IOMMU_CACHE_INV (1 << 2) __u32 nesting_capabilities; __u32 pasid_bits; #define VFIO_IOMMU_VENDOR_SUB_CAP (1 << 3) __u32 flags; __u32 data_size; __u8 data[]; /*iommu info caps defined by iommu uapi */ }; VFIO needs new iommu APIs to ask iommu driver whether PASID/bind_gpasid/ cache_inv/bind_gpasid_table is available or not and also the pasid bits. After that VFIO will ask iommu driver about the iommu_cap_info and fill in the @data[] field. iommu uapi: struct iommu_info_cap_header { __u16 id; /* Identifies capability */ __u16 version; /* Version specific to the capability ID */ __u32 next; /* Offset of next capability */ }; #define IOMMU_INFO_CAP_INTEL_VTD 1 struct iommu_info_cap_intel_vtd { struct iommu_info_cap_header header; __u32 vaddr_width; /* VA addr_width*/ __u32 ipaddr_width; /* IPA addr_width, input of SL page table */ /* same definition with @flags instruct iommu_gpasid_bind_data_vtd */ __u64 flags; }; #define IOMMU_INFO_CAP_ARM_SMMUv3 2 struct iommu_info_cap_arm_smmuv3 { struct iommu_info_cap_header header; ... }; Regards, Yi Liu
Hi Yi, On 4/9/20 2:47 PM, Liu, Yi L wrote: > Hi Jean, > >> From: Jean-Philippe Brucker <jean-philippe@linaro.org> >> Sent: Thursday, April 9, 2020 4:15 PM >> Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to >> userspace >> >> On Wed, Apr 08, 2020 at 12:27:58PM +0200, Auger Eric wrote: >>> Hi Yi, >>> >>> On 4/7/20 11:43 AM, Liu, Yi L wrote: >>>> Hi Jean, >>>> >>>>> From: Jean-Philippe Brucker <jean-philippe@linaro.org> >>>>> Sent: Friday, April 3, 2020 4:23 PM >>>>> To: Auger Eric <eric.auger@redhat.com> >>>>> userspace >>>>> >>>>> On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote: >>>>>>>>> header = vfio_info_cap_add(caps, sizeof(*nesting_cap), >>>>>>>>> >> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1); >>>>> @@ -2254,6 +2309,7 >>>>>>>>> @@ static int vfio_iommu_info_add_nesting_cap(struct >>>>>>>> vfio_iommu *iommu, >>>>>>>>> /* nesting iommu type supports PASID requests (alloc/free) >> */ >>>>>>>>> nesting_cap->nesting_capabilities |= >> VFIO_IOMMU_PASID_REQS; >>>>>>>> What is the meaning for ARM? >>>>>>> >>>>>>> I think it's just a software capability exposed to userspace, on >>>>>>> userspace side, it has a choice to use it or not. :-) The reason >>>>>>> define it and report it in cap nesting is that I'd like to make the >>>>>>> pasid alloc/free be available just for IOMMU with type >>>>>>> VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not good >>>>>>> for ARM. We can find a proper way to report the availability. >>>>>> >>>>>> Well it is more a question for jean-Philippe. Do we have a system wide >>>>>> PASID allocation on ARM? >>>>> >>>>> We don't, the PASID spaces are per-VM on Arm, so this function should consult >> the >>>>> IOMMU driver before setting flags. As you said on patch 3, nested doesn't >>>>> necessarily imply PASID support. The SMMUv2 does not support PASID but does >>>>> support nesting stages 1 and 2 for the IOVA space. >>>>> SMMUv3 support of PASID depends on HW capabilities. So I think this needs to >> be >>>>> finer grained: >>>>> >>>>> Does the container support: >>>>> * VFIO_IOMMU_PASID_REQUEST? >>>>> -> Yes for VT-d 3 >>>>> -> No for Arm SMMU >>>>> * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL? >>>>> -> Yes for VT-d 3 >>>>> -> Sometimes for SMMUv2 >>>>> -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due to >>>>> PASID tables being in GPA space.) >>>>> * VFIO_IOMMU_BIND_PASID_TABLE? >>>>> -> No for VT-d >>>>> -> Sometimes for SMMUv3 >>>>> >>>>> Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support. >>>> >>>> good summary. do you expect to see any >>>> >>>>> >>>>>>>>> + nesting_cap->stage1_formats = formats; >>>>>>>> as spotted by Kevin, since a single format is supported, rename >>>>>>> >>>>>>> ok, I was believing it may be possible on ARM or so. :-) will rename >>>>>>> it. >>>>> >>>>> Yes I don't think an u32 is going to cut it for Arm :( We need to >>>>> describe all sorts >> of >>>>> capabilities for page and PASID tables (granules, GPA size, ASID/PASID size, HW >>>>> access/dirty, etc etc.) Just saying "Arm stage-1 format" wouldn't mean much. I >>>>> guess we could have a secondary vendor capability for these? >>>> >>>> Actually, I'm wondering if we can define some formats to stands for a set of >>>> capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may indicates the 1st level >>>> page table related caps (aw, a/d, SRE, EA and etc.). And vIOMMU can parse >>>> the capabilities. >>> >>> But eventually do we really need all those capability getters? I mean >>> can't we simply rely on the actual call to VFIO_IOMMU_BIND_GUEST_PGTBL() >>> to detect any mismatch? Definitively the error handling may be heavier >>> on userspace but can't we manage. >> >> I think we need to present these capabilities at boot time, long before >> the guest triggers a bind(). For example if the host SMMU doesn't support >> 16-bit ASID, we need to communicate that to the guest using vSMMU ID >> registers or PROBE properties. Otherwise a bind() will succeed, but if the >> guest uses 16-bit ASIDs in its CD, DMA will result in C_BAD_CD events >> which we'll inject into the guest, for no apparent reason from their >> perspective. >> >> In addition some VMMs may have fallbacks if shared page tables are not >> available. They could fall back to a MAP/UNMAP interface, or simply not >> present a vIOMMU to the guest. >> > > Based on the comments, I think it would be a need to report iommu caps > in detail. So I guess iommu uapi needs to provide something alike vfio > cap chain in iommu uapi. Please feel free let me know your thoughts. :-) Yes to me it sounds sensible. > > In vfio, we can define a cap as below: > > struct vfio_iommu_type1_info_cap_nesting { > struct vfio_info_cap_header header; > __u64 iommu_model; > #define VFIO_IOMMU_PASID_REQS (1 << 0) I still think the name shall be changed > #define VFIO_IOMMU_BIND_GPASID (1 << 1) > #define VFIO_IOMMU_CACHE_INV (1 << 2) this operation seems mandated as soon as we have a nested paging based implementation? > __u32 nesting_capabilities; > __u32 pasid_bits; > #define VFIO_IOMMU_VENDOR_SUB_CAP (1 << 3) > __u32 flags; > __u32 data_size; > __u8 data[]; /*iommu info caps defined by iommu uapi */ > }; > > VFIO needs new iommu APIs to ask iommu driver whether PASID/bind_gpasid/ > cache_inv/bind_gpasid_table is available or not and also the pasid > bits. After that VFIO will ask iommu driver about the iommu_cap_info > and fill in the @data[] field. > > iommu uapi: > struct iommu_info_cap_header { > __u16 id; /* Identifies capability */ > __u16 version; /* Version specific to the capability ID */ > __u32 next; /* Offset of next capability */ > }; > > #define IOMMU_INFO_CAP_INTEL_VTD 1 > struct iommu_info_cap_intel_vtd { > struct iommu_info_cap_header header; > __u32 vaddr_width; /* VA addr_width*/ > __u32 ipaddr_width; /* IPA addr_width, input of SL page table */ > /* same definition with @flags instruct iommu_gpasid_bind_data_vtd */ > __u64 flags; > }; > > #define IOMMU_INFO_CAP_ARM_SMMUv3 2 > struct iommu_info_cap_arm_smmuv3 { > struct iommu_info_cap_header header; > ... > }; Thanks Eric > > Regards, > Yi Liu >
Hi Eric, > From: Auger Eric <eric.auger@redhat.com> > Sent: Friday, April 10, 2020 11:28 AM > To: Liu, Yi L <yi.l.liu@intel.com>; Jean-Philippe Brucker <jean- > Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to > userspace > > Hi Yi, > > On 4/9/20 2:47 PM, Liu, Yi L wrote: > > Hi Jean, > > > >> From: Jean-Philippe Brucker <jean-philippe@linaro.org> > >> Sent: Thursday, April 9, 2020 4:15 PM > >> Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 > >> format to userspace > >> > >> On Wed, Apr 08, 2020 at 12:27:58PM +0200, Auger Eric wrote: > >>> Hi Yi, > >>> > >>> On 4/7/20 11:43 AM, Liu, Yi L wrote: > >>>> Hi Jean, > >>>> > >>>>> From: Jean-Philippe Brucker <jean-philippe@linaro.org> > >>>>> Sent: Friday, April 3, 2020 4:23 PM > >>>>> To: Auger Eric <eric.auger@redhat.com> userspace > >>>>> > >>>>> On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote: > >>>>>>>>> header = vfio_info_cap_add(caps, sizeof(*nesting_cap), > >>>>>>>>> > >> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1); > >>>>> @@ -2254,6 +2309,7 > >>>>>>>>> @@ static int vfio_iommu_info_add_nesting_cap(struct > >>>>>>>> vfio_iommu *iommu, > >>>>>>>>> /* nesting iommu type supports PASID requests > (alloc/free) > >> */ > >>>>>>>>> nesting_cap->nesting_capabilities |= > >> VFIO_IOMMU_PASID_REQS; > >>>>>>>> What is the meaning for ARM? > >>>>>>> > >>>>>>> I think it's just a software capability exposed to userspace, on > >>>>>>> userspace side, it has a choice to use it or not. :-) The reason > >>>>>>> define it and report it in cap nesting is that I'd like to make > >>>>>>> the pasid alloc/free be available just for IOMMU with type > >>>>>>> VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not > >>>>>>> good for ARM. We can find a proper way to report the availability. > >>>>>> > >>>>>> Well it is more a question for jean-Philippe. Do we have a system > >>>>>> wide PASID allocation on ARM? > >>>>> > >>>>> We don't, the PASID spaces are per-VM on Arm, so this function > >>>>> should consult > >> the > >>>>> IOMMU driver before setting flags. As you said on patch 3, nested > >>>>> doesn't necessarily imply PASID support. The SMMUv2 does not > >>>>> support PASID but does support nesting stages 1 and 2 for the IOVA space. > >>>>> SMMUv3 support of PASID depends on HW capabilities. So I think > >>>>> this needs to > >> be > >>>>> finer grained: > >>>>> > >>>>> Does the container support: > >>>>> * VFIO_IOMMU_PASID_REQUEST? > >>>>> -> Yes for VT-d 3 > >>>>> -> No for Arm SMMU > >>>>> * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL? > >>>>> -> Yes for VT-d 3 > >>>>> -> Sometimes for SMMUv2 > >>>>> -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler > due to > >>>>> PASID tables being in GPA space.) > >>>>> * VFIO_IOMMU_BIND_PASID_TABLE? > >>>>> -> No for VT-d > >>>>> -> Sometimes for SMMUv3 > >>>>> > >>>>> Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support. > >>>> > >>>> good summary. do you expect to see any > >>>> > >>>>> > >>>>>>>>> + nesting_cap->stage1_formats = formats; > >>>>>>>> as spotted by Kevin, since a single format is supported, rename > >>>>>>> > >>>>>>> ok, I was believing it may be possible on ARM or so. :-) will > >>>>>>> rename it. > >>>>> > >>>>> Yes I don't think an u32 is going to cut it for Arm :( We need to > >>>>> describe all sorts > >> of > >>>>> capabilities for page and PASID tables (granules, GPA size, > >>>>> ASID/PASID size, HW access/dirty, etc etc.) Just saying "Arm > >>>>> stage-1 format" wouldn't mean much. I guess we could have a secondary > vendor capability for these? > >>>> > >>>> Actually, I'm wondering if we can define some formats to stands for > >>>> a set of capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may > >>>> indicates the 1st level page table related caps (aw, a/d, SRE, EA > >>>> and etc.). And vIOMMU can parse the capabilities. > >>> > >>> But eventually do we really need all those capability getters? I > >>> mean can't we simply rely on the actual call to > >>> VFIO_IOMMU_BIND_GUEST_PGTBL() to detect any mismatch? Definitively > >>> the error handling may be heavier on userspace but can't we manage. > >> > >> I think we need to present these capabilities at boot time, long > >> before the guest triggers a bind(). For example if the host SMMU > >> doesn't support 16-bit ASID, we need to communicate that to the guest > >> using vSMMU ID registers or PROBE properties. Otherwise a bind() will > >> succeed, but if the guest uses 16-bit ASIDs in its CD, DMA will > >> result in C_BAD_CD events which we'll inject into the guest, for no > >> apparent reason from their perspective. > >> > >> In addition some VMMs may have fallbacks if shared page tables are > >> not available. They could fall back to a MAP/UNMAP interface, or > >> simply not present a vIOMMU to the guest. > >> > > > > Based on the comments, I think it would be a need to report iommu caps > > in detail. So I guess iommu uapi needs to provide something alike vfio > > cap chain in iommu uapi. Please feel free let me know your thoughts. > > :-) > > Yes to me it sounds sensible. > > > > In vfio, we can define a cap as below: > > > > struct vfio_iommu_type1_info_cap_nesting { > > struct vfio_info_cap_header header; > > __u64 iommu_model; > > #define VFIO_IOMMU_PASID_REQS (1 << 0) > I still think the name shall be changed yes, I'll rename it per your suggestion.:-) > > #define VFIO_IOMMU_BIND_GPASID (1 << 1) > > #define VFIO_IOMMU_CACHE_INV (1 << 2) > this operation seems mandated as soon as we have a nested paging based > implementation? oh, yes, should be. will remove it and comment in the code. Regards, Yi Liu > > __u32 nesting_capabilities; > > __u32 pasid_bits; > > #define VFIO_IOMMU_VENDOR_SUB_CAP (1 << 3) > > __u32 flags; > > __u32 data_size; > > __u8 data[]; /*iommu info caps defined by iommu uapi */ > > }; > > > > VFIO needs new iommu APIs to ask iommu driver whether > > PASID/bind_gpasid/ cache_inv/bind_gpasid_table is available or not and > > also the pasid bits. After that VFIO will ask iommu driver about the > > iommu_cap_info and fill in the @data[] field. > > > > iommu uapi: > > struct iommu_info_cap_header { > > __u16 id; /* Identifies capability */ > > __u16 version; /* Version specific to the capability ID */ > > __u32 next; /* Offset of next capability */ > > }; > > > > #define IOMMU_INFO_CAP_INTEL_VTD 1 > > struct iommu_info_cap_intel_vtd { > > struct iommu_info_cap_header header; > > __u32 vaddr_width; /* VA addr_width*/ > > __u32 ipaddr_width; /* IPA addr_width, input of SL page table */ > > /* same definition with @flags instruct iommu_gpasid_bind_data_vtd */ > > __u64 flags; > > }; > > > > #define IOMMU_INFO_CAP_ARM_SMMUv3 2 > > struct iommu_info_cap_arm_smmuv3 { > > struct iommu_info_cap_header header; > > ... > > }; > > Thanks > > Eric > > > > Regards, > > Yi Liu > >
Hi Jean, Eric, > From: Liu, Yi L <yi.l.liu@intel.com> > Sent: Thursday, April 9, 2020 8:47 PM > Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to > userspace > [...] > > > >> > > > >> Yes I don't think an u32 is going to cut it for Arm :( We need to > > > >> describe all sorts > > of > > > >> capabilities for page and PASID tables (granules, GPA size, ASID/PASID size, > HW > > > >> access/dirty, etc etc.) Just saying "Arm stage-1 format" wouldn't mean > much. I > > > >> guess we could have a secondary vendor capability for these? > > > > > > > > Actually, I'm wondering if we can define some formats to stands for a set of > > > > capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may indicates the 1st > level > > > > page table related caps (aw, a/d, SRE, EA and etc.). And vIOMMU can parse > > > > the capabilities. > > > > > > But eventually do we really need all those capability getters? I mean > > > can't we simply rely on the actual call to VFIO_IOMMU_BIND_GUEST_PGTBL() > > > to detect any mismatch? Definitively the error handling may be heavier > > > on userspace but can't we manage. > > > > I think we need to present these capabilities at boot time, long before > > the guest triggers a bind(). For example if the host SMMU doesn't support > > 16-bit ASID, we need to communicate that to the guest using vSMMU ID > > registers or PROBE properties. Otherwise a bind() will succeed, but if the > > guest uses 16-bit ASIDs in its CD, DMA will result in C_BAD_CD events > > which we'll inject into the guest, for no apparent reason from their > > perspective. > > > > In addition some VMMs may have fallbacks if shared page tables are not > > available. They could fall back to a MAP/UNMAP interface, or simply not > > present a vIOMMU to the guest. > > > > Based on the comments, I think it would be a need to report iommu caps > in detail. So I guess iommu uapi needs to provide something alike vfio > cap chain in iommu uapi. Please feel free let me know your thoughts. :-) Consider more, I guess it may be better to start simpler. Cap chain suits the case in which there are multiple caps. e.g. some vendor iommu driver may want to report iommu capabilities via multiple caps. Actually, in VT-d side, the host IOMMU capability could be reported in a single cap structure. I'm not sure about ARM side. Will there be multiple iommu_info_caps for ARM? > In vfio, we can define a cap as below: > > struct vfio_iommu_type1_info_cap_nesting { > struct vfio_info_cap_header header; > __u64 iommu_model; > #define VFIO_IOMMU_PASID_REQS (1 << 0) > #define VFIO_IOMMU_BIND_GPASID (1 << 1) > #define VFIO_IOMMU_CACHE_INV (1 << 2) > __u32 nesting_capabilities; > __u32 pasid_bits; > #define VFIO_IOMMU_VENDOR_SUB_CAP (1 << 3) > __u32 flags; > __u32 data_size; > __u8 data[]; /*iommu info caps defined by iommu uapi */ > }; > If iommu vendor driver only needs one cap structure to report hw capability, then I think we needn't implement cap chain in iommu uapi. The @data[] field could be determined by the @iommu_model and @flags fields. This would be easier. thoughts? > VFIO needs new iommu APIs to ask iommu driver whether PASID/bind_gpasid/ > cache_inv/bind_gpasid_table is available or not and also the pasid > bits. After that VFIO will ask iommu driver about the iommu_cap_info > and fill in the @data[] field. > > iommu uapi: > struct iommu_info_cap_header { > __u16 id; /* Identifies capability */ > __u16 version; /* Version specific to the capability ID */ > __u32 next; /* Offset of next capability */ > }; > > #define IOMMU_INFO_CAP_INTEL_VTD 1 > struct iommu_info_cap_intel_vtd { > struct iommu_info_cap_header header; > __u32 vaddr_width; /* VA addr_width*/ > __u32 ipaddr_width; /* IPA addr_width, input of SL page table */ > /* same definition with @flags instruct iommu_gpasid_bind_data_vtd */ > __u64 flags; > }; > > #define IOMMU_INFO_CAP_ARM_SMMUv3 2 > struct iommu_info_cap_arm_smmuv3 { > struct iommu_info_cap_header header; > ... > }; Regards, Yi Liu
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 9aa2a67..82a9e0b 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -2234,11 +2234,66 @@ static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu, return ret; } +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu, + u32 *stage1_format) +{ + struct vfio_domain *domain; + u32 format = 0, tmp_format = 0; + int ret; + + mutex_lock(&iommu->lock); + if (list_empty(&iommu->domain_list)) { + mutex_unlock(&iommu->lock); + return -EINVAL; + } + + list_for_each_entry(domain, &iommu->domain_list, next) { + if (iommu_domain_get_attr(domain->domain, + DOMAIN_ATTR_PASID_FORMAT, &format)) { + ret = -EINVAL; + format = 0; + goto out_unlock; + } + /* + * format is always non-zero (the first format is + * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). For + * the reason of potential different backed IOMMU + * formats, here we expect to have identical formats + * in the domain list, no mixed formats support. + * return -EINVAL to fail the attempt of setup + * VFIO_TYPE1_NESTING_IOMMU if non-identical formats + * are detected. + */ + if (tmp_format && tmp_format != format) { + ret = -EINVAL; + format = 0; + goto out_unlock; + } + + tmp_format = format; + } + ret = 0; + +out_unlock: + if (format) + *stage1_format = format; + mutex_unlock(&iommu->lock); + return ret; +} + static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu, struct vfio_info_cap *caps) { struct vfio_info_cap_header *header; struct vfio_iommu_type1_info_cap_nesting *nesting_cap; + u32 formats = 0; + int ret; + + ret = vfio_iommu_get_stage1_format(iommu, &formats); + if (ret) { + pr_warn("Failed to get stage-1 format\n"); + return ret; + } header = vfio_info_cap_add(caps, sizeof(*nesting_cap), VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1); @@ -2254,6 +2309,7 @@ static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu, /* nesting iommu type supports PASID requests (alloc/free) */ nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS; } + nesting_cap->stage1_formats = formats; return 0; } diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index ed9881d..ebeaf3e 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -763,6 +763,7 @@ struct vfio_iommu_type1_info_cap_nesting { struct vfio_info_cap_header header; #define VFIO_IOMMU_PASID_REQS (1 << 0) __u32 nesting_capabilities; + __u32 stage1_formats; }; #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)