Message ID | 20230426145419.450922-3-yi.l.liu@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Enhance vfio PCI hot reset for vfio cdev device | expand |
> From: Liu, Yi L <yi.l.liu@intel.com> > Sent: Wednesday, April 26, 2023 10:54 PM > @@ -121,7 +128,8 @@ static void vfio_emulated_unmap(void *data, > unsigned long iova, > { > struct vfio_device *vdev = data; > > - if (vdev->ops->dma_unmap) > + /* noiommu devices cannot do map/unmap */ > + if (vdev->noiommu && vdev->ops->dma_unmap) > vdev->ops->dma_unmap(vdev, iova, length); Is it necessary? All mdev devices implementing @dma_unmap won't set noiommu flag. Instead in the future if we allow noiommu userspace to pin pages we'd need similar logic too.
> From: Tian, Kevin <kevin.tian@intel.com> > Sent: Thursday, April 27, 2023 2:39 PM > > > From: Liu, Yi L <yi.l.liu@intel.com> > > Sent: Wednesday, April 26, 2023 10:54 PM > > @@ -121,7 +128,8 @@ static void vfio_emulated_unmap(void *data, > > unsigned long iova, > > { > > struct vfio_device *vdev = data; > > > > - if (vdev->ops->dma_unmap) > > + /* noiommu devices cannot do map/unmap */ > > + if (vdev->noiommu && vdev->ops->dma_unmap) > > vdev->ops->dma_unmap(vdev, iova, length); > > Is it necessary? All mdev devices implementing @dma_unmap won't > set noiommu flag. Hmmm. Yes, and all the devices set noiommu is not implementing @dma_unmap as far as I see. Maybe this noiommu check can be removed. > > Instead in the future if we allow noiommu userspace to pin pages > we'd need similar logic too. I'm not quite sure about it so far. For mdev devices, the device driver may use vfio_pin_pages/vfio_dma_rw () to pin page. Hence such drivers need to listen to dma_unmap() event. But for noiommu users, does the device driver also participate in the page pin? At least for vfio-pci driver, it does not, or maybe it will in the future when enabling noiommu userspace to pin pages. It looks to me such userspace should order the DMA before calling ioctl to unpin page instead of letting device driver listen to unmap. Regards, Yi Liu
On Thu, 27 Apr 2023 06:59:17 +0000 "Liu, Yi L" <yi.l.liu@intel.com> wrote: > > From: Tian, Kevin <kevin.tian@intel.com> > > Sent: Thursday, April 27, 2023 2:39 PM > > > > > From: Liu, Yi L <yi.l.liu@intel.com> > > > Sent: Wednesday, April 26, 2023 10:54 PM > > > @@ -121,7 +128,8 @@ static void vfio_emulated_unmap(void *data, > > > unsigned long iova, > > > { > > > struct vfio_device *vdev = data; > > > > > > - if (vdev->ops->dma_unmap) > > > + /* noiommu devices cannot do map/unmap */ > > > + if (vdev->noiommu && vdev->ops->dma_unmap) > > > vdev->ops->dma_unmap(vdev, iova, length); > > > > Is it necessary? All mdev devices implementing @dma_unmap won't > > set noiommu flag. > > Hmmm. Yes, and all the devices set noiommu is not implementing @dma_unmap > as far as I see. Maybe this noiommu check can be removed. Not to mention that the polarity of the noiommu test is backwards here! This also seems to be the only performance path where noiommu is tested and therefore I believe the only actual justification of the previous patch. > > Instead in the future if we allow noiommu userspace to pin pages > > we'd need similar logic too. > > I'm not quite sure about it so far. For mdev devices, the device driver > may use vfio_pin_pages/vfio_dma_rw () to pin page. Hence such drivers > need to listen to dma_unmap() event. But for noiommu users, does the > device driver also participate in the page pin? At least for vfio-pci driver, > it does not, or maybe it will in the future when enabling noiommu > userspace to pin pages. It looks to me such userspace should order > the DMA before calling ioctl to unpin page instead of letting device > driver listen to unmap. Whoa, noiommu is inherently unsafe an only meant to expose the vfio device interface for userspace drivers that are going to do unsafe things regardless. Enabling noiommu to work with mdev, pin pages, or anything else should not be on our agenda. Userspaces relying on niommu get the minimum viable interface and must impose a minuscule incremental maintenance burden. The only reason we're spending so much effort on it here is to make iommufd noiommu support equivalent to group/container noiommu support. We should stop at that. Thanks, Alex
On 2023/4/28 02:32, Alex Williamson wrote: > On Thu, 27 Apr 2023 06:59:17 +0000 > "Liu, Yi L" <yi.l.liu@intel.com> wrote: > >>> From: Tian, Kevin <kevin.tian@intel.com> >>> Sent: Thursday, April 27, 2023 2:39 PM >>> >>>> From: Liu, Yi L <yi.l.liu@intel.com> >>>> Sent: Wednesday, April 26, 2023 10:54 PM >>>> @@ -121,7 +128,8 @@ static void vfio_emulated_unmap(void *data, >>>> unsigned long iova, >>>> { >>>> struct vfio_device *vdev = data; >>>> >>>> - if (vdev->ops->dma_unmap) >>>> + /* noiommu devices cannot do map/unmap */ >>>> + if (vdev->noiommu && vdev->ops->dma_unmap) >>>> vdev->ops->dma_unmap(vdev, iova, length); >>> >>> Is it necessary? All mdev devices implementing @dma_unmap won't >>> set noiommu flag. >> >> Hmmm. Yes, and all the devices set noiommu is not implementing @dma_unmap >> as far as I see. Maybe this noiommu check can be removed. > > Not to mention that the polarity of the noiommu test is backwards here! > This also seems to be the only performance path where noiommu is tested > and therefore I believe the only actual justification of the previous > patch. but this patch needs to use vfio_iommufd_emulated_bind() and vfio_iommufd_emulated_unbind() for the noiommu devices when binding to iommufd. So needs to check noiommu in the vfio_iommufd_bind() and vfio_iommu_unbind() as well. >>> Instead in the future if we allow noiommu userspace to pin pages >>> we'd need similar logic too. >> >> I'm not quite sure about it so far. For mdev devices, the device driver >> may use vfio_pin_pages/vfio_dma_rw () to pin page. Hence such drivers >> need to listen to dma_unmap() event. But for noiommu users, does the >> device driver also participate in the page pin? At least for vfio-pci driver, >> it does not, or maybe it will in the future when enabling noiommu >> userspace to pin pages. It looks to me such userspace should order >> the DMA before calling ioctl to unpin page instead of letting device >> driver listen to unmap. > > Whoa, noiommu is inherently unsafe an only meant to expose the vfio > device interface for userspace drivers that are going to do unsafe > things regardless. Enabling noiommu to work with mdev, pin pages, or > anything else should not be on our agenda. One clarification. I think the idea from Jason is to make noiommu userspace to be able to pin page. [1]. But this is just a potential benefit of creating iommufd_access for noiommu devices. There is no intention to make noiommu devices to work with mdev. [1] https://lore.kernel.org/kvm/ZD1MCc6fD+oisjki@nvidia.com/#t > Userspaces relying on niommu > get the minimum viable interface and must impose a minuscule > incremental maintenance burden. The only reason we're spending so much > effort on it here is to make iommufd noiommu support equivalent to > group/container noiommu support. We should stop at that. Thanks, yes. This is why this patch is to bind noiommu devices to iommufd and create iommufd_access. Otherwise, noiommu devices would have trouble in the hot-reset path when iommufd-based ownership check model is applied, and this is the only model for cdev. So binding noiommu devices to iommufd is necessary for support noiommu in the cdev interface. If this above makes sense. Then back to the question of noiommu check in vfio_emulated_unmap(). At first, I intend to have such a check to avoid calling dma_unmap callback for noiommu devices. But per Kevin's comment and your above statement on mdev and noiommu, so in reality, noiommu device drivers won't implement dma_unmap callback. So it is probably fine to remove the noiommu check in vfio_emulated_unmap().
> From: Liu, Yi L <yi.l.liu@intel.com> > Sent: Friday, April 28, 2023 2:21 PM > > On 2023/4/28 02:32, Alex Williamson wrote: > > On Thu, 27 Apr 2023 06:59:17 +0000 > > "Liu, Yi L" <yi.l.liu@intel.com> wrote: > > > >>> From: Tian, Kevin <kevin.tian@intel.com> > >>> Sent: Thursday, April 27, 2023 2:39 PM > >>> > >>>> From: Liu, Yi L <yi.l.liu@intel.com> > >>>> Sent: Wednesday, April 26, 2023 10:54 PM > >>>> @@ -121,7 +128,8 @@ static void vfio_emulated_unmap(void *data, > >>>> unsigned long iova, > >>>> { > >>>> struct vfio_device *vdev = data; > >>>> > >>>> - if (vdev->ops->dma_unmap) > >>>> + /* noiommu devices cannot do map/unmap */ > >>>> + if (vdev->noiommu && vdev->ops->dma_unmap) > >>>> vdev->ops->dma_unmap(vdev, iova, length); > >>> > >>> Is it necessary? All mdev devices implementing @dma_unmap won't > >>> set noiommu flag. > >> > >> Hmmm. Yes, and all the devices set noiommu is not implementing > @dma_unmap > >> as far as I see. Maybe this noiommu check can be removed. > > > > Not to mention that the polarity of the noiommu test is backwards here! > > This also seems to be the only performance path where noiommu is tested > > and therefore I believe the only actual justification of the previous > > patch. > > but this patch needs to use vfio_iommufd_emulated_bind() and > vfio_iommufd_emulated_unbind() for the noiommu devices when binding > to iommufd. So needs to check noiommu in the vfio_iommufd_bind() > and vfio_iommu_unbind() as well. > You can continue to use vfio_device_is_noiommu() in this patch. It's not big deal to drop it from this series and then add back in cdev series.
On 2023/4/28 15:00, Tian, Kevin wrote: >> From: Liu, Yi L <yi.l.liu@intel.com> >> Sent: Friday, April 28, 2023 2:21 PM >> >> On 2023/4/28 02:32, Alex Williamson wrote: >>> On Thu, 27 Apr 2023 06:59:17 +0000 >>> "Liu, Yi L" <yi.l.liu@intel.com> wrote: >>> >>>>> From: Tian, Kevin <kevin.tian@intel.com> >>>>> Sent: Thursday, April 27, 2023 2:39 PM >>>>> >>>>>> From: Liu, Yi L <yi.l.liu@intel.com> >>>>>> Sent: Wednesday, April 26, 2023 10:54 PM >>>>>> @@ -121,7 +128,8 @@ static void vfio_emulated_unmap(void *data, >>>>>> unsigned long iova, >>>>>> { >>>>>> struct vfio_device *vdev = data; >>>>>> >>>>>> - if (vdev->ops->dma_unmap) >>>>>> + /* noiommu devices cannot do map/unmap */ >>>>>> + if (vdev->noiommu && vdev->ops->dma_unmap) >>>>>> vdev->ops->dma_unmap(vdev, iova, length); >>>>> >>>>> Is it necessary? All mdev devices implementing @dma_unmap won't >>>>> set noiommu flag. >>>> >>>> Hmmm. Yes, and all the devices set noiommu is not implementing >> @dma_unmap >>>> as far as I see. Maybe this noiommu check can be removed. >>> >>> Not to mention that the polarity of the noiommu test is backwards here! >>> This also seems to be the only performance path where noiommu is tested >>> and therefore I believe the only actual justification of the previous >>> patch. >> >> but this patch needs to use vfio_iommufd_emulated_bind() and >> vfio_iommufd_emulated_unbind() for the noiommu devices when binding >> to iommufd. So needs to check noiommu in the vfio_iommufd_bind() >> and vfio_iommu_unbind() as well. >> > > You can continue to use vfio_device_is_noiommu() in this patch. It's not > big deal to drop it from this series and then add back in cdev series. yes.:-) patch 01 of this series was added more per the cdev series reviews.
On Fri, Apr 28, 2023 at 02:21:26PM +0800, Yi Liu wrote: > but this patch needs to use vfio_iommufd_emulated_bind() and > vfio_iommufd_emulated_unbind() for the noiommu devices when binding > to iommufd. So needs to check noiommu in the vfio_iommufd_bind() > and vfio_iommu_unbind() as well. I'm not sure this is strictly necessary, it just needs an access The emulated stuff is for mdev only, it should not be confused with no-iommu Eg if you had a no_iommu_access value to store the access it would be fine and could serve as the 'this is no_iommu' flag The only purpose of the access at this moment is to get an iommufdctx id to return to userspace. Jason
On 2023/4/28 20:07, Jason Gunthorpe wrote: > On Fri, Apr 28, 2023 at 02:21:26PM +0800, Yi Liu wrote: > >> but this patch needs to use vfio_iommufd_emulated_bind() and >> vfio_iommufd_emulated_unbind() for the noiommu devices when binding >> to iommufd. So needs to check noiommu in the vfio_iommufd_bind() >> and vfio_iommu_unbind() as well. > > I'm not sure this is strictly necessary, it just needs an access > > The emulated stuff is for mdev only, it should not be confused with > no-iommu hmmm. I guess the confusion is due to the reuse of vfio_iommufd_emulated_bind(). > > Eg if you had a no_iommu_access value to store the access it would be > fine and could serve as the 'this is no_iommu' flag So this no_iommu_access shall be created per iommufd bind, and call the iommufd_access_create() with iommufd_access_ops. is it? If so, this is not 100% the same with no_iommu flag as this flag is static after device registration. > > The only purpose of the access at this moment is to get an iommufdctx > id to return to userspace. yes. and it is the iommufd_access ID to be returned to userspace.
On 2023/4/28 02:32, Alex Williamson wrote: > On Thu, 27 Apr 2023 06:59:17 +0000 > "Liu, Yi L" <yi.l.liu@intel.com> wrote: > [...] >> >> I'm not quite sure about it so far. For mdev devices, the device driver >> may use vfio_pin_pages/vfio_dma_rw () to pin page. Hence such drivers >> need to listen to dma_unmap() event. But for noiommu users, does the >> device driver also participate in the page pin? At least for vfio-pci driver, >> it does not, or maybe it will in the future when enabling noiommu >> userspace to pin pages. It looks to me such userspace should order >> the DMA before calling ioctl to unpin page instead of letting device >> driver listen to unmap. > > Whoa, noiommu is inherently unsafe an only meant to expose the vfio > device interface for userspace drivers that are going to do unsafe > things regardless. Enabling noiommu to work with mdev, pin pages, or > anything else should not be on our agenda. Userspaces relying on niommu > get the minimum viable interface and must impose a minuscule > incremental maintenance burden. The only reason we're spending so much > effort on it here is to make iommufd noiommu support equivalent to > group/container noiommu support. We should stop at that. Thanks, btw. I asked a question in [1] to check if we should allow attach/detach on noiommu devices. Jason has replied it. If in future noiommu userspace can pin page, then such userspace will need to attach/detach ioas. So I made cdev series[2] to allow attach ioas on noiommu devices. Supporting it from cdev day-1 may avoid probing if attach/detach is supported or not for specific devices when adding pin page for noiommu userspace. But now, I think such a support will not in plan, is it? If so, will it be better to disallow attach/detach on noiommu devices in patch [2]? [1] https://lore.kernel.org/kvm/ZEa+khH0tUFStRMW@nvidia.com/ [2] https://lore.kernel.org/kvm/20230426150321.454465-21-yi.l.liu@intel.com/
On Sat, Apr 29, 2023 at 12:07:24AM +0800, Yi Liu wrote: > > The emulated stuff is for mdev only, it should not be confused with > > no-iommu > > hmmm. I guess the confusion is due to the reuse of > vfio_iommufd_emulated_bind(). This is probabl y not a good direction > > Eg if you had a no_iommu_access value to store the access it would be > > fine and could serve as the 'this is no_iommu' flag > > So this no_iommu_access shall be created per iommufd bind, and call the > iommufd_access_create() with iommufd_access_ops. is it? If so, this is > not 100% the same with no_iommu flag as this flag is static after device > registration. Something like that, yes I don't think it is any real difference with the current flag, both are determined at the first ioctl when the iommufd is presented and both would state permanently until the fd close Jason
On Sat, Apr 29, 2023 at 12:13:39AM +0800, Yi Liu wrote: > > Whoa, noiommu is inherently unsafe an only meant to expose the vfio > > device interface for userspace drivers that are going to do unsafe > > things regardless. Enabling noiommu to work with mdev, pin pages, or > > anything else should not be on our agenda. Userspaces relying on niommu > > get the minimum viable interface and must impose a minuscule > > incremental maintenance burden. The only reason we're spending so much > > effort on it here is to make iommufd noiommu support equivalent to > > group/container noiommu support. We should stop at that. Thanks, > > btw. I asked a question in [1] to check if we should allow attach/detach > on noiommu devices. Jason has replied it. If in future noiommu userspace > can pin page, then such userspace will need to attach/detach ioas. So I > made cdev series[2] to allow attach ioas on noiommu devices. Supporting > it from cdev day-1 may avoid probing if attach/detach is supported or > not for specific devices when adding pin page for noiommu userspace. > > But now, I think such a support will not in plan, is it? If so, will it > be better to disallow attach/detach on noiommu devices in patch [2]? > > [1] https://lore.kernel.org/kvm/ZEa+khH0tUFStRMW@nvidia.com/ > [2] https://lore.kernel.org/kvm/20230426150321.454465-21-yi.l.liu@intel.com/ If we block it then userspace has to act quite differently, I think we should keep it. My general idea to complete the no-iommu feature is to add a new IOCTL to VFIO that is 'pin iova and return dma addr' that no-iommu userspace would call instead of trying to abuse mlock and /proc/ to do it. That ioctl would use the IOAS attached to the access just like a mdev would do, so it has a real IOVA, but it is not a mdev. unmap callback just does nothing, as Alex says it is all still totally unsafe. This just allows it use the mm a little more properly and safely (eg mlock() doesn't set things like page_maybe_dma_pinned(), proc doesn't reject things like DAX and it currently doesn't make an adjustment for the PCI offset stuff..) So it would make DPDK a little more robust, portable and make the whole VFIO no-iommu feature much easier to use. To do that we need an iommufd access, an access ID and we need to link the current IOAS to the special access, like mdev, but in any mdev code paths. That creating the access ID solves the reset problem as well is a nice side effect and is the only part of this you should focus on for now.. Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Wednesday, May 3, 2023 2:12 AM > > On Sat, Apr 29, 2023 at 12:07:24AM +0800, Yi Liu wrote: > > > The emulated stuff is for mdev only, it should not be confused with > > > no-iommu > > > > hmmm. I guess the confusion is due to the reuse of > > vfio_iommufd_emulated_bind(). > > This is probabl y not a good direction I see. But if not reusing, then there may be a few code duplications. I'm fine to add separate _bind/unbind() functions for noiommu devices if Alex and you prefer it. > > > Eg if you had a no_iommu_access value to store the access it would be > > > fine and could serve as the 'this is no_iommu' flag > > > > So this no_iommu_access shall be created per iommufd bind, and call the > > iommufd_access_create() with iommufd_access_ops. is it? If so, this is > > not 100% the same with no_iommu flag as this flag is static after device > > registration. > > Something like that, yes > > I don't think it is any real difference with the current flag, both > are determined at the first ioctl when the iommufd is presented and > both would state permanently until the fd close Well, noiommu flag would be static from registration till unregistration.:-) While no_iommu_access's life circle is between the bind and fd close. But given that the major usage of it is during the duration between fd is bound to iommufd and closed, so it's still possible to let no_iommu_access serve as noiommu flag.
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Wednesday, May 3, 2023 2:22 AM > > On Sat, Apr 29, 2023 at 12:13:39AM +0800, Yi Liu wrote: > > > > Whoa, noiommu is inherently unsafe an only meant to expose the vfio > > > device interface for userspace drivers that are going to do unsafe > > > things regardless. Enabling noiommu to work with mdev, pin pages, or > > > anything else should not be on our agenda. Userspaces relying on niommu > > > get the minimum viable interface and must impose a minuscule > > > incremental maintenance burden. The only reason we're spending so much > > > effort on it here is to make iommufd noiommu support equivalent to > > > group/container noiommu support. We should stop at that. Thanks, > > > > btw. I asked a question in [1] to check if we should allow attach/detach > > on noiommu devices. Jason has replied it. If in future noiommu userspace > > can pin page, then such userspace will need to attach/detach ioas. So I > > made cdev series[2] to allow attach ioas on noiommu devices. Supporting > > it from cdev day-1 may avoid probing if attach/detach is supported or > > not for specific devices when adding pin page for noiommu userspace. > > > > But now, I think such a support will not in plan, is it? If so, will it > > be better to disallow attach/detach on noiommu devices in patch [2]? > > > > [1] https://lore.kernel.org/kvm/ZEa+khH0tUFStRMW@nvidia.com/ > > [2] https://lore.kernel.org/kvm/20230426150321.454465-21-yi.l.liu@intel.com/ > > If we block it then userspace has to act quite differently, I think we > should keep it. Maybe kernel can simply fail the attach/detach if it happens on noiommu devices, and noiommu userspace should just know it would fail. @Alex, how about your opinion? > My general idea to complete the no-iommu feature is to add a new IOCTL > to VFIO that is 'pin iova and return dma addr' that no-iommu userspace > would call instead of trying to abuse mlock and /proc/ to do it. That > ioctl would use the IOAS attached to the access just like a mdev would > do, so it has a real IOVA, but it is not a mdev. This new ioctl may be IOMMUFD ioctl since its input is the IOAS and addr, nothing related to the device. Is it? > unmap callback just does nothing, as Alex says it is all still totally > unsafe. Sure. That's also why I added a noiommu test to avoid calling unmap callback although it seems not possible to have unmap callback as mdev drivers would implement it. > > This just allows it use the mm a little more properly and safely (eg > mlock() doesn't set things like page_maybe_dma_pinned(), proc doesn't > reject things like DAX and it currently doesn't make an adjustment for > the PCI offset stuff..) So it would make DPDK a little more robust, > portable and make the whole VFIO no-iommu feature much easier to use. Thanks for the explanation. > To do that we need an iommufd access, an access ID and we need to link > the current IOAS to the special access, like mdev, but in any mdev > code paths. > > That creating the access ID solves the reset problem as well is a nice > side effect and is the only part of this you should focus on for now.. Yes. I get this part. We only need access ID so far to fix the noiommu gap in hot-reset. Regards, Yi Liu
> > My general idea to complete the no-iommu feature is to add a new IOCTL > > to VFIO that is 'pin iova and return dma addr' that no-iommu userspace > > would call instead of trying to abuse mlock and /proc/ to do it. That > > ioctl would use the IOAS attached to the access just like a mdev would > > do, so it has a real IOVA, but it is not a mdev. > > This new ioctl may be IOMMUFD ioctl since its input is the IOAS and > addr, nothing related to the device. Is it? No, definately a VFIO special ioctl for VFIO no-iommu mode. > Sure. That's also why I added a noiommu test to avoid calling > unmap callback although it seems not possible to have unmap > callback as mdev drivers would implement it. Just have a special noiommu ops and use an empty unmap function and pass that to the special noiommu access creation. Jason
On Wed, May 03, 2023 at 09:48:36AM +0000, Liu, Yi L wrote: > > From: Jason Gunthorpe <jgg@nvidia.com> > > Sent: Wednesday, May 3, 2023 2:12 AM > > > > On Sat, Apr 29, 2023 at 12:07:24AM +0800, Yi Liu wrote: > > > > The emulated stuff is for mdev only, it should not be confused with > > > > no-iommu > > > > > > hmmm. I guess the confusion is due to the reuse of > > > vfio_iommufd_emulated_bind(). > > > > This is probabl y not a good direction > > I see. But if not reusing, then there may be a few code duplications. > I'm fine to add separate _bind/unbind() functions for noiommu devices > if Alex and you prefer it. I think you will find there is minimal duplication Jason
On Wed, 3 May 2023 16:41:52 -0300 Jason Gunthorpe <jgg@nvidia.com> wrote: > > > My general idea to complete the no-iommu feature is to add a new IOCTL > > > to VFIO that is 'pin iova and return dma addr' that no-iommu userspace > > > would call instead of trying to abuse mlock and /proc/ to do it. That > > > ioctl would use the IOAS attached to the access just like a mdev would > > > do, so it has a real IOVA, but it is not a mdev. > > > > This new ioctl may be IOMMUFD ioctl since its input is the IOAS and > > addr, nothing related to the device. Is it? > > No, definately a VFIO special ioctl for VFIO no-iommu mode. This seems like brushing off the dirty work to vfio. Userspace drivers relying on no-iommu are in pretty questionable territory already, do we care if they don't have a good means to pin pages and derive the DMA address of those pinned pages? As I noted earlier, I'm really not interested in expanding no-iommu, it's less important now than when it was added (we have vIOMMUs in VMs now and more platforms with IOMMUs) and we shouldn't be encouraging its use by further developing the interface. Thanks, Alex
> From: Liu, Yi L <yi.l.liu@intel.com> > Sent: Wednesday, May 3, 2023 5:49 PM > > > From: Jason Gunthorpe <jgg@nvidia.com> > > Sent: Wednesday, May 3, 2023 2:12 AM > > > > On Sat, Apr 29, 2023 at 12:07:24AM +0800, Yi Liu wrote: > > > > The emulated stuff is for mdev only, it should not be confused with > > > > no-iommu > > > > > > hmmm. I guess the confusion is due to the reuse of > > > vfio_iommufd_emulated_bind(). > > > > This is probabl y not a good direction > > I see. But if not reusing, then there may be a few code duplications. > I'm fine to add separate _bind/unbind() functions for noiommu devices > if Alex and you prefer it. > > > > > Eg if you had a no_iommu_access value to store the access it would be > > > > fine and could serve as the 'this is no_iommu' flag > > > > > > So this no_iommu_access shall be created per iommufd bind, and call the > > > iommufd_access_create() with iommufd_access_ops. is it? If so, this is > > > not 100% the same with no_iommu flag as this flag is static after device > > > registration. > > > > Something like that, yes > > > > I don't think it is any real difference with the current flag, both > > are determined at the first ioctl when the iommufd is presented and > > both would state permanently until the fd close > > Well, noiommu flag would be static from registration till unregistration.:-) > While no_iommu_access's life circle is between the bind and fd close. But > given that the major usage of it is during the duration between fd is bound > to iommufd and closed, so it's still possible to let no_iommu_access serve > as noiommu flag.
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c index 895852ad37ed..ca29c4feded3 100644 --- a/drivers/vfio/iommufd.c +++ b/drivers/vfio/iommufd.c @@ -29,7 +29,8 @@ int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx) */ if (!iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id)) return -EPERM; - return 0; + + return vfio_iommufd_emulated_bind(vdev, ictx, &device_id); } ret = vdev->ops->bind_iommufd(vdev, ictx, &device_id); @@ -59,8 +60,10 @@ void vfio_iommufd_unbind(struct vfio_device *vdev) { lockdep_assert_held(&vdev->dev_set->lock); - if (vdev->noiommu) + if (vdev->noiommu) { + vfio_iommufd_emulated_unbind(vdev); return; + } if (vdev->ops->unbind_iommufd) vdev->ops->unbind_iommufd(vdev); @@ -110,10 +113,14 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id) EXPORT_SYMBOL_GPL(vfio_iommufd_physical_attach_ioas); /* - * The emulated standard ops mean that vfio_device is going to use the - * "mdev path" and will call vfio_pin_pages()/vfio_dma_rw(). Drivers using this - * ops set should call vfio_register_emulated_iommu_dev(). Drivers that do - * not call vfio_pin_pages()/vfio_dma_rw() have no need to provide dma_unmap. + * The emulated standard ops can be used by below usages: + * 1) The vfio_device that is going to use the "mdev path" and will call + * vfio_pin_pages()/vfio_dma_rw(). Such drivers using should call + * vfio_register_emulated_iommu_dev(). Drivers that do not call + * vfio_pin_pages()/vfio_dma_rw() have no need to provide dma_unmap. + * 2) The noiommu device which doesn't have backend iommu but creating + * an iommufd_access allows generating a dev_id for it. noiommu device + * is not allowed to do map/unmap so this becomes a nop. */ static void vfio_emulated_unmap(void *data, unsigned long iova, @@ -121,7 +128,8 @@ static void vfio_emulated_unmap(void *data, unsigned long iova, { struct vfio_device *vdev = data; - if (vdev->ops->dma_unmap) + /* noiommu devices cannot do map/unmap */ + if (vdev->noiommu && vdev->ops->dma_unmap) vdev->ops->dma_unmap(vdev, iova, length); }
This binds noiommu device to iommufd and creates iommufd_access for this bond. This is useful for adding an iommufd-based device ownership check for VFIO_DEVICE_PCI_HOT_RESET since this model requires all the other affected devices bound to the same iommufd as the device to be reset. For noiommu devices, there is no backend iommu, so create iommufd_access instead of iommufd_device. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> --- drivers/vfio/iommufd.c | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-)