diff mbox series

[v4,2/9] vfio-iommufd: Create iommufd_access for noiommu devices

Message ID 20230426145419.450922-3-yi.l.liu@intel.com (mailing list archive)
State New, archived
Headers show
Series Enhance vfio PCI hot reset for vfio cdev device | expand

Commit Message

Yi Liu April 26, 2023, 2:54 p.m. UTC
This binds noiommu device to iommufd and creates iommufd_access for this
bond. This is useful for adding an iommufd-based device ownership check
for VFIO_DEVICE_PCI_HOT_RESET since this model requires all the other
affected devices bound to the same iommufd as the device to be reset.
For noiommu devices, there is no backend iommu, so create iommufd_access
instead of iommufd_device.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/iommufd.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

Comments

Tian, Kevin April 27, 2023, 6:39 a.m. UTC | #1
> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Wednesday, April 26, 2023 10:54 PM
> @@ -121,7 +128,8 @@ static void vfio_emulated_unmap(void *data,
> unsigned long iova,
>  {
>  	struct vfio_device *vdev = data;
> 
> -	if (vdev->ops->dma_unmap)
> +	/* noiommu devices cannot do map/unmap */
> +	if (vdev->noiommu && vdev->ops->dma_unmap)
>  		vdev->ops->dma_unmap(vdev, iova, length);

Is it necessary? All mdev devices implementing @dma_unmap won't
set noiommu flag.

Instead in the future if we allow noiommu userspace to pin pages
we'd need similar logic too.
Yi Liu April 27, 2023, 6:59 a.m. UTC | #2
> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Thursday, April 27, 2023 2:39 PM
> 
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Wednesday, April 26, 2023 10:54 PM
> > @@ -121,7 +128,8 @@ static void vfio_emulated_unmap(void *data,
> > unsigned long iova,
> >  {
> >  	struct vfio_device *vdev = data;
> >
> > -	if (vdev->ops->dma_unmap)
> > +	/* noiommu devices cannot do map/unmap */
> > +	if (vdev->noiommu && vdev->ops->dma_unmap)
> >  		vdev->ops->dma_unmap(vdev, iova, length);
> 
> Is it necessary? All mdev devices implementing @dma_unmap won't
> set noiommu flag.

Hmmm. Yes, and all the devices set noiommu is not implementing @dma_unmap
as far as I see. Maybe this noiommu check can be removed.

> 
> Instead in the future if we allow noiommu userspace to pin pages
> we'd need similar logic too.

I'm not quite sure about it so far. For mdev devices, the device driver
may use vfio_pin_pages/vfio_dma_rw () to pin page. Hence such drivers
need to listen to dma_unmap() event. But for noiommu users, does the
device driver also participate in the page pin? At least for vfio-pci driver,
it does not, or maybe it will in the future when enabling noiommu
userspace to pin pages. It looks to me such userspace should order
the DMA before calling ioctl to unpin page instead of letting device
driver listen to unmap.

Regards,
Yi Liu
Alex Williamson April 27, 2023, 6:32 p.m. UTC | #3
On Thu, 27 Apr 2023 06:59:17 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Tian, Kevin <kevin.tian@intel.com>
> > Sent: Thursday, April 27, 2023 2:39 PM
> >   
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Wednesday, April 26, 2023 10:54 PM
> > > @@ -121,7 +128,8 @@ static void vfio_emulated_unmap(void *data,
> > > unsigned long iova,
> > >  {
> > >  	struct vfio_device *vdev = data;
> > >
> > > -	if (vdev->ops->dma_unmap)
> > > +	/* noiommu devices cannot do map/unmap */
> > > +	if (vdev->noiommu && vdev->ops->dma_unmap)
> > >  		vdev->ops->dma_unmap(vdev, iova, length);  
> > 
> > Is it necessary? All mdev devices implementing @dma_unmap won't
> > set noiommu flag.  
> 
> Hmmm. Yes, and all the devices set noiommu is not implementing @dma_unmap
> as far as I see. Maybe this noiommu check can be removed.

Not to mention that the polarity of the noiommu test is backwards here!
This also seems to be the only performance path where noiommu is tested
and therefore I believe the only actual justification of the previous
patch.
 
> > Instead in the future if we allow noiommu userspace to pin pages
> > we'd need similar logic too.  
> 
> I'm not quite sure about it so far. For mdev devices, the device driver
> may use vfio_pin_pages/vfio_dma_rw () to pin page. Hence such drivers
> need to listen to dma_unmap() event. But for noiommu users, does the
> device driver also participate in the page pin? At least for vfio-pci driver,
> it does not, or maybe it will in the future when enabling noiommu
> userspace to pin pages. It looks to me such userspace should order
> the DMA before calling ioctl to unpin page instead of letting device
> driver listen to unmap.

Whoa, noiommu is inherently unsafe an only meant to expose the vfio
device interface for userspace drivers that are going to do unsafe
things regardless.  Enabling noiommu to work with mdev, pin pages, or
anything else should not be on our agenda.  Userspaces relying on niommu
get the minimum viable interface and must impose a minuscule
incremental maintenance burden.  The only reason we're spending so much
effort on it here is to make iommufd noiommu support equivalent to
group/container noiommu support.  We should stop at that.  Thanks,

Alex
Yi Liu April 28, 2023, 6:21 a.m. UTC | #4
On 2023/4/28 02:32, Alex Williamson wrote:
> On Thu, 27 Apr 2023 06:59:17 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
>>> From: Tian, Kevin <kevin.tian@intel.com>
>>> Sent: Thursday, April 27, 2023 2:39 PM
>>>    
>>>> From: Liu, Yi L <yi.l.liu@intel.com>
>>>> Sent: Wednesday, April 26, 2023 10:54 PM
>>>> @@ -121,7 +128,8 @@ static void vfio_emulated_unmap(void *data,
>>>> unsigned long iova,
>>>>   {
>>>>   	struct vfio_device *vdev = data;
>>>>
>>>> -	if (vdev->ops->dma_unmap)
>>>> +	/* noiommu devices cannot do map/unmap */
>>>> +	if (vdev->noiommu && vdev->ops->dma_unmap)
>>>>   		vdev->ops->dma_unmap(vdev, iova, length);
>>>
>>> Is it necessary? All mdev devices implementing @dma_unmap won't
>>> set noiommu flag.
>>
>> Hmmm. Yes, and all the devices set noiommu is not implementing @dma_unmap
>> as far as I see. Maybe this noiommu check can be removed.
> 
> Not to mention that the polarity of the noiommu test is backwards here!
> This also seems to be the only performance path where noiommu is tested
> and therefore I believe the only actual justification of the previous
> patch.

but this patch needs to use vfio_iommufd_emulated_bind() and
vfio_iommufd_emulated_unbind() for the noiommu devices when binding
to iommufd. So needs to check noiommu in the vfio_iommufd_bind()
and vfio_iommu_unbind() as well.

>>> Instead in the future if we allow noiommu userspace to pin pages
>>> we'd need similar logic too.
>>
>> I'm not quite sure about it so far. For mdev devices, the device driver
>> may use vfio_pin_pages/vfio_dma_rw () to pin page. Hence such drivers
>> need to listen to dma_unmap() event. But for noiommu users, does the
>> device driver also participate in the page pin? At least for vfio-pci driver,
>> it does not, or maybe it will in the future when enabling noiommu
>> userspace to pin pages. It looks to me such userspace should order
>> the DMA before calling ioctl to unpin page instead of letting device
>> driver listen to unmap.
> 
> Whoa, noiommu is inherently unsafe an only meant to expose the vfio
> device interface for userspace drivers that are going to do unsafe
> things regardless.  Enabling noiommu to work with mdev, pin pages, or
> anything else should not be on our agenda.

One clarification. I think the idea from Jason is to make noiommu
userspace to be able to pin page. [1]. But this is just a potential
benefit of creating iommufd_access for noiommu devices. There is no
intention to make noiommu devices to work with mdev.

[1] https://lore.kernel.org/kvm/ZD1MCc6fD+oisjki@nvidia.com/#t

> Userspaces relying on niommu
> get the minimum viable interface and must impose a minuscule
> incremental maintenance burden.  The only reason we're spending so much
> effort on it here is to make iommufd noiommu support equivalent to
> group/container noiommu support.  We should stop at that.  Thanks,

yes. This is why this patch is to bind noiommu devices to iommufd
and create iommufd_access. Otherwise, noiommu devices would have
trouble in the hot-reset path when iommufd-based ownership check
model is applied, and this is the only model for cdev. So binding
noiommu devices to iommufd is necessary for support noiommu in the
cdev interface.

If this above makes sense. Then back to the question of noiommu
check in vfio_emulated_unmap(). At first, I intend to have such
a check to avoid calling dma_unmap callback for noiommu devices.
But per Kevin's comment and your above statement on mdev and noiommu,
so in reality, noiommu device drivers won't implement dma_unmap
callback. So it is probably fine to remove the noiommu check in
vfio_emulated_unmap().
Tian, Kevin April 28, 2023, 7 a.m. UTC | #5
> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Friday, April 28, 2023 2:21 PM
> 
> On 2023/4/28 02:32, Alex Williamson wrote:
> > On Thu, 27 Apr 2023 06:59:17 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >
> >>> From: Tian, Kevin <kevin.tian@intel.com>
> >>> Sent: Thursday, April 27, 2023 2:39 PM
> >>>
> >>>> From: Liu, Yi L <yi.l.liu@intel.com>
> >>>> Sent: Wednesday, April 26, 2023 10:54 PM
> >>>> @@ -121,7 +128,8 @@ static void vfio_emulated_unmap(void *data,
> >>>> unsigned long iova,
> >>>>   {
> >>>>   	struct vfio_device *vdev = data;
> >>>>
> >>>> -	if (vdev->ops->dma_unmap)
> >>>> +	/* noiommu devices cannot do map/unmap */
> >>>> +	if (vdev->noiommu && vdev->ops->dma_unmap)
> >>>>   		vdev->ops->dma_unmap(vdev, iova, length);
> >>>
> >>> Is it necessary? All mdev devices implementing @dma_unmap won't
> >>> set noiommu flag.
> >>
> >> Hmmm. Yes, and all the devices set noiommu is not implementing
> @dma_unmap
> >> as far as I see. Maybe this noiommu check can be removed.
> >
> > Not to mention that the polarity of the noiommu test is backwards here!
> > This also seems to be the only performance path where noiommu is tested
> > and therefore I believe the only actual justification of the previous
> > patch.
> 
> but this patch needs to use vfio_iommufd_emulated_bind() and
> vfio_iommufd_emulated_unbind() for the noiommu devices when binding
> to iommufd. So needs to check noiommu in the vfio_iommufd_bind()
> and vfio_iommu_unbind() as well.
> 

You can continue to use vfio_device_is_noiommu() in this patch. It's not
big deal to drop it from this series and then add back in cdev series.
Yi Liu April 28, 2023, 7:04 a.m. UTC | #6
On 2023/4/28 15:00, Tian, Kevin wrote:
>> From: Liu, Yi L <yi.l.liu@intel.com>
>> Sent: Friday, April 28, 2023 2:21 PM
>>
>> On 2023/4/28 02:32, Alex Williamson wrote:
>>> On Thu, 27 Apr 2023 06:59:17 +0000
>>> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
>>>
>>>>> From: Tian, Kevin <kevin.tian@intel.com>
>>>>> Sent: Thursday, April 27, 2023 2:39 PM
>>>>>
>>>>>> From: Liu, Yi L <yi.l.liu@intel.com>
>>>>>> Sent: Wednesday, April 26, 2023 10:54 PM
>>>>>> @@ -121,7 +128,8 @@ static void vfio_emulated_unmap(void *data,
>>>>>> unsigned long iova,
>>>>>>    {
>>>>>>    	struct vfio_device *vdev = data;
>>>>>>
>>>>>> -	if (vdev->ops->dma_unmap)
>>>>>> +	/* noiommu devices cannot do map/unmap */
>>>>>> +	if (vdev->noiommu && vdev->ops->dma_unmap)
>>>>>>    		vdev->ops->dma_unmap(vdev, iova, length);
>>>>>
>>>>> Is it necessary? All mdev devices implementing @dma_unmap won't
>>>>> set noiommu flag.
>>>>
>>>> Hmmm. Yes, and all the devices set noiommu is not implementing
>> @dma_unmap
>>>> as far as I see. Maybe this noiommu check can be removed.
>>>
>>> Not to mention that the polarity of the noiommu test is backwards here!
>>> This also seems to be the only performance path where noiommu is tested
>>> and therefore I believe the only actual justification of the previous
>>> patch.
>>
>> but this patch needs to use vfio_iommufd_emulated_bind() and
>> vfio_iommufd_emulated_unbind() for the noiommu devices when binding
>> to iommufd. So needs to check noiommu in the vfio_iommufd_bind()
>> and vfio_iommu_unbind() as well.
>>
> 
> You can continue to use vfio_device_is_noiommu() in this patch. It's not
> big deal to drop it from this series and then add back in cdev series.

yes.:-) patch 01 of this series was added more per the cdev series reviews.
Jason Gunthorpe April 28, 2023, 12:07 p.m. UTC | #7
On Fri, Apr 28, 2023 at 02:21:26PM +0800, Yi Liu wrote:

> but this patch needs to use vfio_iommufd_emulated_bind() and
> vfio_iommufd_emulated_unbind() for the noiommu devices when binding
> to iommufd. So needs to check noiommu in the vfio_iommufd_bind()
> and vfio_iommu_unbind() as well.

I'm not sure this is strictly necessary, it just needs an access

The emulated stuff is for mdev only, it should not be confused with
no-iommu

Eg if you had a no_iommu_access value to store the access it would be
fine and could serve as the 'this is no_iommu' flag

The only purpose of the access at this moment is to get an iommufdctx
id to return to userspace.

Jason
Yi Liu April 28, 2023, 4:07 p.m. UTC | #8
On 2023/4/28 20:07, Jason Gunthorpe wrote:
> On Fri, Apr 28, 2023 at 02:21:26PM +0800, Yi Liu wrote:
> 
>> but this patch needs to use vfio_iommufd_emulated_bind() and
>> vfio_iommufd_emulated_unbind() for the noiommu devices when binding
>> to iommufd. So needs to check noiommu in the vfio_iommufd_bind()
>> and vfio_iommu_unbind() as well.
> 
> I'm not sure this is strictly necessary, it just needs an access
> 
> The emulated stuff is for mdev only, it should not be confused with
> no-iommu

hmmm. I guess the confusion is due to the reuse of 
vfio_iommufd_emulated_bind().

> 
> Eg if you had a no_iommu_access value to store the access it would be
> fine and could serve as the 'this is no_iommu' flag

So this no_iommu_access shall be created per iommufd bind, and call the
iommufd_access_create() with iommufd_access_ops. is it? If so, this is
not 100% the same with no_iommu flag as this flag is static after device
registration.

> 
> The only purpose of the access at this moment is to get an iommufdctx
> id to return to userspace.

yes. and it is the iommufd_access ID to be returned to userspace.
Yi Liu April 28, 2023, 4:13 p.m. UTC | #9
On 2023/4/28 02:32, Alex Williamson wrote:
> On Thu, 27 Apr 2023 06:59:17 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
[...]
>>
>> I'm not quite sure about it so far. For mdev devices, the device driver
>> may use vfio_pin_pages/vfio_dma_rw () to pin page. Hence such drivers
>> need to listen to dma_unmap() event. But for noiommu users, does the
>> device driver also participate in the page pin? At least for vfio-pci driver,
>> it does not, or maybe it will in the future when enabling noiommu
>> userspace to pin pages. It looks to me such userspace should order
>> the DMA before calling ioctl to unpin page instead of letting device
>> driver listen to unmap.
> 
> Whoa, noiommu is inherently unsafe an only meant to expose the vfio
> device interface for userspace drivers that are going to do unsafe
> things regardless.  Enabling noiommu to work with mdev, pin pages, or
> anything else should not be on our agenda.  Userspaces relying on niommu
> get the minimum viable interface and must impose a minuscule
> incremental maintenance burden.  The only reason we're spending so much
> effort on it here is to make iommufd noiommu support equivalent to
> group/container noiommu support.  We should stop at that.  Thanks,

btw. I asked a question in [1] to check if we should allow attach/detach
on noiommu devices. Jason has replied it. If in future noiommu userspace
can pin page, then such userspace will need to attach/detach ioas. So I
made cdev series[2] to allow attach ioas on noiommu devices. Supporting
it from cdev day-1 may avoid probing if attach/detach is supported or
not for specific devices when adding pin page for noiommu userspace.

But now, I think such a support will not in plan, is it? If so, will it
be better to disallow attach/detach on noiommu devices in patch [2]?

[1] https://lore.kernel.org/kvm/ZEa+khH0tUFStRMW@nvidia.com/
[2] https://lore.kernel.org/kvm/20230426150321.454465-21-yi.l.liu@intel.com/
Jason Gunthorpe May 2, 2023, 6:12 p.m. UTC | #10
On Sat, Apr 29, 2023 at 12:07:24AM +0800, Yi Liu wrote:
> > The emulated stuff is for mdev only, it should not be confused with
> > no-iommu
> 
> hmmm. I guess the confusion is due to the reuse of
> vfio_iommufd_emulated_bind().

This is probabl y not a good direction

> > Eg if you had a no_iommu_access value to store the access it would be
> > fine and could serve as the 'this is no_iommu' flag
> 
> So this no_iommu_access shall be created per iommufd bind, and call the
> iommufd_access_create() with iommufd_access_ops. is it? If so, this is
> not 100% the same with no_iommu flag as this flag is static after device
> registration.

Something like that, yes

I don't think it is any real difference with the current flag, both
are determined at the first ioctl when the iommufd is presented and
both would state permanently until the fd close

Jason
Jason Gunthorpe May 2, 2023, 6:22 p.m. UTC | #11
On Sat, Apr 29, 2023 at 12:13:39AM +0800, Yi Liu wrote:

> > Whoa, noiommu is inherently unsafe an only meant to expose the vfio
> > device interface for userspace drivers that are going to do unsafe
> > things regardless.  Enabling noiommu to work with mdev, pin pages, or
> > anything else should not be on our agenda.  Userspaces relying on niommu
> > get the minimum viable interface and must impose a minuscule
> > incremental maintenance burden.  The only reason we're spending so much
> > effort on it here is to make iommufd noiommu support equivalent to
> > group/container noiommu support.  We should stop at that.  Thanks,
> 
> btw. I asked a question in [1] to check if we should allow attach/detach
> on noiommu devices. Jason has replied it. If in future noiommu userspace
> can pin page, then such userspace will need to attach/detach ioas. So I
> made cdev series[2] to allow attach ioas on noiommu devices. Supporting
> it from cdev day-1 may avoid probing if attach/detach is supported or
> not for specific devices when adding pin page for noiommu userspace.
> 
> But now, I think such a support will not in plan, is it? If so, will it
> be better to disallow attach/detach on noiommu devices in patch [2]?
> 
> [1] https://lore.kernel.org/kvm/ZEa+khH0tUFStRMW@nvidia.com/
> [2] https://lore.kernel.org/kvm/20230426150321.454465-21-yi.l.liu@intel.com/

If we block it then userspace has to act quite differently, I think we
should keep it.

My general idea to complete the no-iommu feature is to add a new IOCTL
to VFIO that is 'pin iova and return dma addr' that no-iommu userspace
would call instead of trying to abuse mlock and /proc/ to do it. That
ioctl would use the IOAS attached to the access just like a mdev would
do, so it has a real IOVA, but it is not a mdev.

unmap callback just does nothing, as Alex says it is all still totally
unsafe.

This just allows it use the mm a little more properly and safely (eg
mlock() doesn't set things like page_maybe_dma_pinned(), proc doesn't
reject things like DAX and it currently doesn't make an adjustment for
the PCI offset stuff..) So it would make DPDK a little more robust,
portable and make the whole VFIO no-iommu feature much easier to use.

To do that we need an iommufd access, an access ID and we need to link
the current IOAS to the special access, like mdev, but in any mdev
code paths.

That creating the access ID solves the reset problem as well is a nice
side effect and is the only part of this you should focus on for now..

Jason
Yi Liu May 3, 2023, 9:48 a.m. UTC | #12
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, May 3, 2023 2:12 AM
> 
> On Sat, Apr 29, 2023 at 12:07:24AM +0800, Yi Liu wrote:
> > > The emulated stuff is for mdev only, it should not be confused with
> > > no-iommu
> >
> > hmmm. I guess the confusion is due to the reuse of
> > vfio_iommufd_emulated_bind().
> 
> This is probabl y not a good direction

I see. But if not reusing, then there may be a few code duplications.
I'm fine to add separate _bind/unbind() functions for noiommu devices
if Alex and you prefer it.

> > > Eg if you had a no_iommu_access value to store the access it would be
> > > fine and could serve as the 'this is no_iommu' flag
> >
> > So this no_iommu_access shall be created per iommufd bind, and call the
> > iommufd_access_create() with iommufd_access_ops. is it? If so, this is
> > not 100% the same with no_iommu flag as this flag is static after device
> > registration.
> 
> Something like that, yes
> 
> I don't think it is any real difference with the current flag, both
> are determined at the first ioctl when the iommufd is presented and
> both would state permanently until the fd close

Well, noiommu flag would be static from registration till unregistration.:-)
While no_iommu_access's life circle is between the bind and fd close. But
given that the major usage of it is during the duration between fd is bound
to iommufd and closed, so it's still possible to let no_iommu_access serve
as noiommu flag. 
Yi Liu May 3, 2023, 9:57 a.m. UTC | #13
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, May 3, 2023 2:22 AM
> 
> On Sat, Apr 29, 2023 at 12:13:39AM +0800, Yi Liu wrote:
> 
> > > Whoa, noiommu is inherently unsafe an only meant to expose the vfio
> > > device interface for userspace drivers that are going to do unsafe
> > > things regardless.  Enabling noiommu to work with mdev, pin pages, or
> > > anything else should not be on our agenda.  Userspaces relying on niommu
> > > get the minimum viable interface and must impose a minuscule
> > > incremental maintenance burden.  The only reason we're spending so much
> > > effort on it here is to make iommufd noiommu support equivalent to
> > > group/container noiommu support.  We should stop at that.  Thanks,
> >
> > btw. I asked a question in [1] to check if we should allow attach/detach
> > on noiommu devices. Jason has replied it. If in future noiommu userspace
> > can pin page, then such userspace will need to attach/detach ioas. So I
> > made cdev series[2] to allow attach ioas on noiommu devices. Supporting
> > it from cdev day-1 may avoid probing if attach/detach is supported or
> > not for specific devices when adding pin page for noiommu userspace.
> >
> > But now, I think such a support will not in plan, is it? If so, will it
> > be better to disallow attach/detach on noiommu devices in patch [2]?
> >
> > [1] https://lore.kernel.org/kvm/ZEa+khH0tUFStRMW@nvidia.com/
> > [2] https://lore.kernel.org/kvm/20230426150321.454465-21-yi.l.liu@intel.com/
>
> If we block it then userspace has to act quite differently, I think we
> should keep it.

Maybe kernel can simply fail the attach/detach if it happens on noiommu
devices, and noiommu userspace should just know it would fail. @Alex,
how about your opinion?

> My general idea to complete the no-iommu feature is to add a new IOCTL
> to VFIO that is 'pin iova and return dma addr' that no-iommu userspace
> would call instead of trying to abuse mlock and /proc/ to do it. That
> ioctl would use the IOAS attached to the access just like a mdev would
> do, so it has a real IOVA, but it is not a mdev.

This new ioctl may be IOMMUFD ioctl since its input is the IOAS and
addr, nothing related to the device. Is it?

> unmap callback just does nothing, as Alex says it is all still totally
> unsafe.

Sure. That's also why I added a noiommu test to avoid calling
unmap callback although it seems not possible to have unmap
callback as mdev drivers would implement it.

> 
> This just allows it use the mm a little more properly and safely (eg
> mlock() doesn't set things like page_maybe_dma_pinned(), proc doesn't
> reject things like DAX and it currently doesn't make an adjustment for
> the PCI offset stuff..) So it would make DPDK a little more robust,
> portable and make the whole VFIO no-iommu feature much easier to use.

Thanks for the explanation.

> To do that we need an iommufd access, an access ID and we need to link
> the current IOAS to the special access, like mdev, but in any mdev
> code paths.
> 
> That creating the access ID solves the reset problem as well is a nice
> side effect and is the only part of this you should focus on for now..

Yes. I get this part. We only need access ID so far to fix the noiommu
gap in hot-reset.

Regards,
Yi Liu
Jason Gunthorpe May 3, 2023, 7:41 p.m. UTC | #14
> > My general idea to complete the no-iommu feature is to add a new IOCTL
> > to VFIO that is 'pin iova and return dma addr' that no-iommu userspace
> > would call instead of trying to abuse mlock and /proc/ to do it. That
> > ioctl would use the IOAS attached to the access just like a mdev would
> > do, so it has a real IOVA, but it is not a mdev.
> 
> This new ioctl may be IOMMUFD ioctl since its input is the IOAS and
> addr, nothing related to the device. Is it?

No, definately a VFIO special ioctl for VFIO no-iommu mode.

> Sure. That's also why I added a noiommu test to avoid calling
> unmap callback although it seems not possible to have unmap
> callback as mdev drivers would implement it.

Just have a special noiommu ops and use an empty unmap function and
pass that to the special noiommu access creation.

Jason
Jason Gunthorpe May 3, 2023, 7:42 p.m. UTC | #15
On Wed, May 03, 2023 at 09:48:36AM +0000, Liu, Yi L wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, May 3, 2023 2:12 AM
> > 
> > On Sat, Apr 29, 2023 at 12:07:24AM +0800, Yi Liu wrote:
> > > > The emulated stuff is for mdev only, it should not be confused with
> > > > no-iommu
> > >
> > > hmmm. I guess the confusion is due to the reuse of
> > > vfio_iommufd_emulated_bind().
> > 
> > This is probabl y not a good direction
> 
> I see. But if not reusing, then there may be a few code duplications.
> I'm fine to add separate _bind/unbind() functions for noiommu devices
> if Alex and you prefer it.

I think you will find there is minimal duplication

Jason
Alex Williamson May 3, 2023, 10:49 p.m. UTC | #16
On Wed, 3 May 2023 16:41:52 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> > > My general idea to complete the no-iommu feature is to add a new IOCTL
> > > to VFIO that is 'pin iova and return dma addr' that no-iommu userspace
> > > would call instead of trying to abuse mlock and /proc/ to do it. That
> > > ioctl would use the IOAS attached to the access just like a mdev would
> > > do, so it has a real IOVA, but it is not a mdev.  
> > 
> > This new ioctl may be IOMMUFD ioctl since its input is the IOAS and
> > addr, nothing related to the device. Is it?  
> 
> No, definately a VFIO special ioctl for VFIO no-iommu mode.

This seems like brushing off the dirty work to vfio.  Userspace drivers
relying on no-iommu are in pretty questionable territory already, do we
care if they don't have a good means to pin pages and derive the DMA
address of those pinned pages?  As I noted earlier, I'm really not
interested in expanding no-iommu, it's less important now than when it
was added (we have vIOMMUs in VMs now and more platforms with IOMMUs)
and we shouldn't be encouraging its use by further developing the
interface.  Thanks,

Alex
Yi Liu May 8, 2023, 3:46 p.m. UTC | #17
> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Wednesday, May 3, 2023 5:49 PM
> 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, May 3, 2023 2:12 AM
> >
> > On Sat, Apr 29, 2023 at 12:07:24AM +0800, Yi Liu wrote:
> > > > The emulated stuff is for mdev only, it should not be confused with
> > > > no-iommu
> > >
> > > hmmm. I guess the confusion is due to the reuse of
> > > vfio_iommufd_emulated_bind().
> >
> > This is probabl y not a good direction
> 
> I see. But if not reusing, then there may be a few code duplications.
> I'm fine to add separate _bind/unbind() functions for noiommu devices
> if Alex and you prefer it.
> 
> > > > Eg if you had a no_iommu_access value to store the access it would be
> > > > fine and could serve as the 'this is no_iommu' flag
> > >
> > > So this no_iommu_access shall be created per iommufd bind, and call the
> > > iommufd_access_create() with iommufd_access_ops. is it? If so, this is
> > > not 100% the same with no_iommu flag as this flag is static after device
> > > registration.
> >
> > Something like that, yes
> >
> > I don't think it is any real difference with the current flag, both
> > are determined at the first ioctl when the iommufd is presented and
> > both would state permanently until the fd close
> 
> Well, noiommu flag would be static from registration till unregistration.:-)
> While no_iommu_access's life circle is between the bind and fd close. But
> given that the major usage of it is during the duration between fd is bound
> to iommufd and closed, so it's still possible to let no_iommu_access serve
> as noiommu flag. 
diff mbox series

Patch

diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 895852ad37ed..ca29c4feded3 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -29,7 +29,8 @@  int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
 		 */
 		if (!iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id))
 			return -EPERM;
-		return 0;
+
+		return vfio_iommufd_emulated_bind(vdev, ictx, &device_id);
 	}
 
 	ret = vdev->ops->bind_iommufd(vdev, ictx, &device_id);
@@ -59,8 +60,10 @@  void vfio_iommufd_unbind(struct vfio_device *vdev)
 {
 	lockdep_assert_held(&vdev->dev_set->lock);
 
-	if (vdev->noiommu)
+	if (vdev->noiommu) {
+		vfio_iommufd_emulated_unbind(vdev);
 		return;
+	}
 
 	if (vdev->ops->unbind_iommufd)
 		vdev->ops->unbind_iommufd(vdev);
@@ -110,10 +113,14 @@  int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 EXPORT_SYMBOL_GPL(vfio_iommufd_physical_attach_ioas);
 
 /*
- * The emulated standard ops mean that vfio_device is going to use the
- * "mdev path" and will call vfio_pin_pages()/vfio_dma_rw(). Drivers using this
- * ops set should call vfio_register_emulated_iommu_dev(). Drivers that do
- * not call vfio_pin_pages()/vfio_dma_rw() have no need to provide dma_unmap.
+ * The emulated standard ops can be used by below usages:
+ * 1) The vfio_device that is going to use the "mdev path" and will call
+ *    vfio_pin_pages()/vfio_dma_rw().  Such drivers using should call
+ *    vfio_register_emulated_iommu_dev().  Drivers that do not call
+ *    vfio_pin_pages()/vfio_dma_rw() have no need to provide dma_unmap.
+ * 2) The noiommu device which doesn't have backend iommu but creating
+ *    an iommufd_access allows generating a dev_id for it. noiommu device
+ *    is not allowed to do map/unmap so this becomes a nop.
  */
 
 static void vfio_emulated_unmap(void *data, unsigned long iova,
@@ -121,7 +128,8 @@  static void vfio_emulated_unmap(void *data, unsigned long iova,
 {
 	struct vfio_device *vdev = data;
 
-	if (vdev->ops->dma_unmap)
+	/* noiommu devices cannot do map/unmap */
+	if (vdev->noiommu && vdev->ops->dma_unmap)
 		vdev->ops->dma_unmap(vdev, iova, length);
 }