diff mbox series

[1/9] vfio: Make vfio_(un)register_notifier accept a vfio_device

Message ID 1-v1-a8faf768d202+125dd-vfio_mdev_no_group_jgg@nvidia.com (mailing list archive)
State New, archived
Headers show
Series Make the rest of the VFIO driver interface use vfio_device | expand

Commit Message

Jason Gunthorpe April 12, 2022, 3:53 p.m. UTC
All callers have a struct vfio_device trivially available, pass it in
directly and avoid calling the expensive vfio_group_get_from_dev().

To support the unconverted kvmgt mdev driver add
mdev_legacy_get_vfio_device() which will return the vfio_device pointer
vfio_mdev.c puts in the drv_data.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c  | 15 +++++++++------
 drivers/s390/cio/vfio_ccw_ops.c   |  7 +++----
 drivers/s390/crypto/vfio_ap_ops.c | 14 +++++++-------
 drivers/vfio/mdev/vfio_mdev.c     | 12 ++++++++++++
 drivers/vfio/vfio.c               | 25 +++++++------------------
 include/linux/mdev.h              |  1 +
 include/linux/vfio.h              |  4 ++--
 7 files changed, 41 insertions(+), 37 deletions(-)

Comments

Christoph Hellwig April 13, 2022, 5:55 a.m. UTC | #1
On Tue, Apr 12, 2022 at 12:53:28PM -0300, Jason Gunthorpe wrote:
> All callers have a struct vfio_device trivially available, pass it in
> directly and avoid calling the expensive vfio_group_get_from_dev().

Instead of bothering the drivers with the notifiers at all, the two
notifier_blocks should move into struct vfio_device, and the
vfio_ops should just grow two new dma_unmap and set_kvm methods.

This will isolate the drives from the whole notifiers mess and it's
boilerplate code.
Jason Gunthorpe April 13, 2022, 11:39 a.m. UTC | #2
On Wed, Apr 13, 2022 at 07:55:24AM +0200, Christoph Hellwig wrote:
> On Tue, Apr 12, 2022 at 12:53:28PM -0300, Jason Gunthorpe wrote:
> > All callers have a struct vfio_device trivially available, pass it in
> > directly and avoid calling the expensive vfio_group_get_from_dev().
> 
> Instead of bothering the drivers with the notifiers at all, the two
> notifier_blocks should move into struct vfio_device, and the
> vfio_ops should just grow two new dma_unmap and set_kvm methods.
> 
> This will isolate the drives from the whole notifiers mess and it's
> boilerplate code.

I already looked into that for a while, it is a real mess too because
of how the notifiers are used by the current drivers, eg gvt assumes
the notifier is called during its open_device callback to setup its
kvm.

The unmap is less convoluted and I might still try to do that..

For this series I prefer to leave it alone

Jason
Christoph Hellwig April 13, 2022, 4:06 p.m. UTC | #3
On Wed, Apr 13, 2022 at 08:39:52AM -0300, Jason Gunthorpe wrote:
> I already looked into that for a while, it is a real mess too because
> of how the notifiers are used by the current drivers, eg gvt assumes
> the notifier is called during its open_device callback to setup its
> kvm.

gvt very much expects kvm to be set before open and thus get the
cachup notifier, yes.  And given that this is how qemu uses
the ioctl I think we can actually simplify this further and require
vfio_group_set_kvm to be called before open for s390/ap as well and
do away with this whole mess.

> For this series I prefer to leave it alone

Ok, let's do it one step at a time.
Jason Gunthorpe April 13, 2022, 4:18 p.m. UTC | #4
On Wed, Apr 13, 2022 at 06:06:01PM +0200, Christoph Hellwig wrote:
> On Wed, Apr 13, 2022 at 08:39:52AM -0300, Jason Gunthorpe wrote:
> > I already looked into that for a while, it is a real mess too because
> > of how the notifiers are used by the current drivers, eg gvt assumes
> > the notifier is called during its open_device callback to setup its
> > kvm.
> 
> gvt very much expects kvm to be set before open and thus get the
> cachup notifier, yes.  And given that this is how qemu uses
> the ioctl I think we can actually simplify this further and require
> vfio_group_set_kvm to be called before open for s390/ap as well and
> do away with this whole mess.

Yeah, I was thinking about that too, but on the other hand I think it
is completely wrong that gvt requires kvm at all. A vfio_device is not
supposed to be tightly linked to KVM - the only exception possibly
being s390..

Jason
Christoph Hellwig April 13, 2022, 4:29 p.m. UTC | #5
On Wed, Apr 13, 2022 at 01:18:14PM -0300, Jason Gunthorpe wrote:
> Yeah, I was thinking about that too, but on the other hand I think it
> is completely wrong that gvt requires kvm at all. A vfio_device is not
> supposed to be tightly linked to KVM - the only exception possibly
> being s390..

So i915/gvt uses it for:

 - poking into the KVM GFN translations
 - using the KVM page track notifier

No idea how these could be solved in a more generic way.
Jason Gunthorpe April 13, 2022, 5:37 p.m. UTC | #6
On Wed, Apr 13, 2022 at 06:29:46PM +0200, Christoph Hellwig wrote:
> On Wed, Apr 13, 2022 at 01:18:14PM -0300, Jason Gunthorpe wrote:
> > Yeah, I was thinking about that too, but on the other hand I think it
> > is completely wrong that gvt requires kvm at all. A vfio_device is not
> > supposed to be tightly linked to KVM - the only exception possibly
> > being s390..
> 
> So i915/gvt uses it for:
> 
>  - poking into the KVM GFN translations
>  - using the KVM page track notifier
> 
> No idea how these could be solved in a more generic way.

TBH I'm not sure how any of this works fully correctly..

I see this code getting something it calls a GFN and then passing
them to vfio - which makes no sense. Either a value is a GFN - the
physical memory address of the VM, or it is an IOVA. VFIO only takes
in IOVA and kvm only takes in GFN. So these are probably IOVAs really..

But then, I see this code taking GFNs (which are probably IOVAs?) and
passing them to the kvm page track notifier? That can't be right, VFIO
needs to translate the IOVA to a GFN, not assume 1:1...

It seems the purpose is to shadow a page table, and it is capturing
user space CPU writes to this page table memory I guess?

GFN's seems to come from gen8_gtt_get_pfn which seems to be parsing
some guest page table?

Jason
Wang, Zhi A April 13, 2022, 7:17 p.m. UTC | #7
On 4/13/22 5:37 PM, Jason Gunthorpe wrote:
> On Wed, Apr 13, 2022 at 06:29:46PM +0200, Christoph Hellwig wrote:
>> On Wed, Apr 13, 2022 at 01:18:14PM -0300, Jason Gunthorpe wrote:
>>> Yeah, I was thinking about that too, but on the other hand I think it
>>> is completely wrong that gvt requires kvm at all. A vfio_device is not
>>> supposed to be tightly linked to KVM - the only exception possibly
>>> being s390..
>>
>> So i915/gvt uses it for:
>>
>>  - poking into the KVM GFN translations
>>  - using the KVM page track notifier
>>
>> No idea how these could be solved in a more generic way.
> 
> TBH I'm not sure how any of this works fully correctly..
> 
> I see this code getting something it calls a GFN and then passing
> them to vfio - which makes no sense. Either a value is a GFN - the
> physical memory address of the VM, or it is an IOVA. VFIO only takes
> in IOVA and kvm only takes in GFN. So these are probably IOVAs really..
> 
Can you let me know the place? So that I can take a look.

> But then, I see this code taking GFNs (which are probably IOVAs?) and
> passing them to the kvm page track notifier? That can't be right, VFIO
> needs to translate the IOVA to a GFN, not assume 1:1...
> 
GFNs are from the guest page table. It takes the GFN from a entry belongs
to a guest page table and request the kvm_page_track to track it, so that
the shadow page table can be updated accordingly.

> It seems the purpose is to shadow a page table, and it is capturing
> user space CPU writes to this page table memory I guess?
> 
Yes.The shadow page will be built according to the guest GPU page table.
When a guest workload is executed in the GPU, the root pointer of the
shadow page table in the shadow GPU context is used. If the host enables
the IOMMU, the pages used by the shadow page table needs to be mapped as
IOVA, and the PFNs in the shadow entries are IOVAs.

> GFN's seems to come from gen8_gtt_get_pfn which seems to be parsing
> some guest page table?
> 
Yes. It's to extract the PFNs from a page table entry.

> Jason
>
Jason Gunthorpe April 13, 2022, 8:04 p.m. UTC | #8
On Wed, Apr 13, 2022 at 07:17:52PM +0000, Wang, Zhi A wrote:
> On 4/13/22 5:37 PM, Jason Gunthorpe wrote:
> > On Wed, Apr 13, 2022 at 06:29:46PM +0200, Christoph Hellwig wrote:
> >> On Wed, Apr 13, 2022 at 01:18:14PM -0300, Jason Gunthorpe wrote:
> >>> Yeah, I was thinking about that too, but on the other hand I think it
> >>> is completely wrong that gvt requires kvm at all. A vfio_device is not
> >>> supposed to be tightly linked to KVM - the only exception possibly
> >>> being s390..
> >>
> >> So i915/gvt uses it for:
> >>
> >>  - poking into the KVM GFN translations
> >>  - using the KVM page track notifier
> >>
> >> No idea how these could be solved in a more generic way.
> > 
> > TBH I'm not sure how any of this works fully correctly..
> > 
> > I see this code getting something it calls a GFN and then passing
> > them to vfio - which makes no sense. Either a value is a GFN - the
> > physical memory address of the VM, or it is an IOVA. VFIO only takes
> > in IOVA and kvm only takes in GFN. So these are probably IOVAs really..
> > 
> Can you let me know the place? So that I can take a look.

Well, for instance:

static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
		unsigned long size, struct page **page)

There is no way that is a GFN, it is an IOVA.

> > It seems the purpose is to shadow a page table, and it is capturing
> > user space CPU writes to this page table memory I guess?
> > 
> Yes.The shadow page will be built according to the guest GPU page table.
> When a guest workload is executed in the GPU, the root pointer of the
> shadow page table in the shadow GPU context is used. If the host enables
> the IOMMU, the pages used by the shadow page table needs to be mapped as
> IOVA, and the PFNs in the shadow entries are IOVAs.

So if the page table in the guest has IOVA addreses then why can you
use them as GFNs?

Or is it that only the page table levels themselves are GFNs and the
actual DMA's are IOVA? The unclear mixing of GFN as IOVA in the code
makes it inscrutible.

Jason
Wang, Zhi A April 13, 2022, 9:08 p.m. UTC | #9
On 4/13/22 8:04 PM, Jason Gunthorpe wrote:
> On Wed, Apr 13, 2022 at 07:17:52PM +0000, Wang, Zhi A wrote:
>> On 4/13/22 5:37 PM, Jason Gunthorpe wrote:
>>> On Wed, Apr 13, 2022 at 06:29:46PM +0200, Christoph Hellwig wrote:
>>>> On Wed, Apr 13, 2022 at 01:18:14PM -0300, Jason Gunthorpe wrote:
>>>>> Yeah, I was thinking about that too, but on the other hand I think it
>>>>> is completely wrong that gvt requires kvm at all. A vfio_device is not
>>>>> supposed to be tightly linked to KVM - the only exception possibly
>>>>> being s390..
>>>>
>>>> So i915/gvt uses it for:
>>>>
>>>>  - poking into the KVM GFN translations
>>>>  - using the KVM page track notifier
>>>>
>>>> No idea how these could be solved in a more generic way.
>>>
>>> TBH I'm not sure how any of this works fully correctly..
>>>
>>> I see this code getting something it calls a GFN and then passing
>>> them to vfio - which makes no sense. Either a value is a GFN - the
>>> physical memory address of the VM, or it is an IOVA. VFIO only takes
>>> in IOVA and kvm only takes in GFN. So these are probably IOVAs really..
>>>
>> Can you let me know the place? So that I can take a look.
> 
> Well, for instance:
> 
> static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
> 		unsigned long size, struct page **page)
> 
> There is no way that is a GFN, it is an IOVA.
> 
I see. The name is vague. There is an promised 1:1 mapping between guest GFN
and host IOVA when a PCI device is passed to a VM, I guess mdev is just
leveraging it as they are sharing the same code path in QEMU. It's in a
function called vfio_listener_region_add() in the source code of QEMU.
Are you planning to change the architecture? It would be nice to know your plan.

>>> It seems the purpose is to shadow a page table, and it is capturing
>>> user space CPU writes to this page table memory I guess?
>>>
>> Yes.The shadow page will be built according to the guest GPU page table.
>> When a guest workload is executed in the GPU, the root pointer of the
>> shadow page table in the shadow GPU context is used. If the host enables
>> the IOMMU, the pages used by the shadow page table needs to be mapped as
>> IOVA, and the PFNs in the shadow entries are IOVAs.
> 
> So if the page table in the guest has IOVA addreses then why can you
> use them as GFNs?
> 
That's another problem. We don't support a guess enabling the guest IOMMU
(aka virtual IOMMU). The guest/virtual IOMMU is implemented in QEMU, so
does the translation between guest IOVA and GFN. For a mdev model
implemented in the kernel, there isn't any mechanism so far to reach there.

People were discussing it before. But none agreement was achieved. Is it
possible to implement it in the kernel? Would like to discuss more about it
if there is any good idea.

> Or is it that only the page table levels themselves are GFNs and the
> actual DMA's are IOVA? The unclear mixing of GFN as IOVA in the code
> makes it inscrutible.
>

No. Even the HW is capable of controlling the level of translation, but
it's not used like this in the existing driver. It's definitely an
architecture open.
 
> Jason
>
Jason Gunthorpe April 13, 2022, 11:12 p.m. UTC | #10
On Wed, Apr 13, 2022 at 09:08:40PM +0000, Wang, Zhi A wrote:
> On 4/13/22 8:04 PM, Jason Gunthorpe wrote:
> > On Wed, Apr 13, 2022 at 07:17:52PM +0000, Wang, Zhi A wrote:
> >> On 4/13/22 5:37 PM, Jason Gunthorpe wrote:
> >>> On Wed, Apr 13, 2022 at 06:29:46PM +0200, Christoph Hellwig wrote:
> >>>> On Wed, Apr 13, 2022 at 01:18:14PM -0300, Jason Gunthorpe wrote:
> >>>>> Yeah, I was thinking about that too, but on the other hand I think it
> >>>>> is completely wrong that gvt requires kvm at all. A vfio_device is not
> >>>>> supposed to be tightly linked to KVM - the only exception possibly
> >>>>> being s390..
> >>>>
> >>>> So i915/gvt uses it for:
> >>>>
> >>>>  - poking into the KVM GFN translations
> >>>>  - using the KVM page track notifier
> >>>>
> >>>> No idea how these could be solved in a more generic way.
> >>>
> >>> TBH I'm not sure how any of this works fully correctly..
> >>>
> >>> I see this code getting something it calls a GFN and then passing
> >>> them to vfio - which makes no sense. Either a value is a GFN - the
> >>> physical memory address of the VM, or it is an IOVA. VFIO only takes
> >>> in IOVA and kvm only takes in GFN. So these are probably IOVAs really..
> >>>
> >> Can you let me know the place? So that I can take a look.
> > 
> > Well, for instance:
> > 
> > static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
> > 		unsigned long size, struct page **page)
> > 
> > There is no way that is a GFN, it is an IOVA.
> > 
> I see. The name is vague. There is an promised 1:1 mapping between guest GFN
> and host IOVA when a PCI device is passed to a VM, I guess mdev is just
> leveraging it as they are sharing the same code path in QEMU.

That has never been true. It happens to be the case in some common scenarios.

> > So if the page table in the guest has IOVA addreses then why can you
> > use them as GFNs?
> 
> That's another problem. We don't support a guess enabling the guest IOMMU
> (aka virtual IOMMU). The guest/virtual IOMMU is implemented in QEMU, so
> does the translation between guest IOVA and GFN. For a mdev model
> implemented in the kernel, there isn't any mechanism so far to reach there.

And this is the uncommon scenario, there is no way for the mdev driver
to know if viommu is turned on, and AFAIK, no way to block it from VFIO.

> People were discussing it before. But none agreement was achieved. Is it
> possible to implement it in the kernel? Would like to discuss more about it
> if there is any good idea.

I don't know of anything, VFIO and kvm are not intended to be tightly
linked like this, they don't have the same view of the world.

Jason
Tian, Kevin April 14, 2022, 2:04 a.m. UTC | #11
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, April 14, 2022 7:12 AM
> 
> On Wed, Apr 13, 2022 at 09:08:40PM +0000, Wang, Zhi A wrote:
> > On 4/13/22 8:04 PM, Jason Gunthorpe wrote:
> > > On Wed, Apr 13, 2022 at 07:17:52PM +0000, Wang, Zhi A wrote:
> > >> On 4/13/22 5:37 PM, Jason Gunthorpe wrote:
> > >>> On Wed, Apr 13, 2022 at 06:29:46PM +0200, Christoph Hellwig wrote:
> > >>>> On Wed, Apr 13, 2022 at 01:18:14PM -0300, Jason Gunthorpe wrote:
> > >>>>> Yeah, I was thinking about that too, but on the other hand I think it
> > >>>>> is completely wrong that gvt requires kvm at all. A vfio_device is not
> > >>>>> supposed to be tightly linked to KVM - the only exception possibly
> > >>>>> being s390..
> > >>>>
> > >>>> So i915/gvt uses it for:
> > >>>>
> > >>>>  - poking into the KVM GFN translations

The only user of this is is_2MB_gtt_possible() which I suppose should
go through vfio instead of kvm as it actually means IOVA here.

> > >>>>  - using the KVM page track notifier

This is the real reason which causes the mess as write-protecting
CPU access to certain guest memory has to go through KVM.

> > >>>>
> > >>>> No idea how these could be solved in a more generic way.
> > >>>
> > >>> TBH I'm not sure how any of this works fully correctly..
> > >>>
> > >>> I see this code getting something it calls a GFN and then passing
> > >>> them to vfio - which makes no sense. Either a value is a GFN - the
> > >>> physical memory address of the VM, or it is an IOVA. VFIO only takes
> > >>> in IOVA and kvm only takes in GFN. So these are probably IOVAs really..
> > >>>
> > >> Can you let me know the place? So that I can take a look.
> > >
> > > Well, for instance:
> > >
> > > static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
> > > 		unsigned long size, struct page **page)
> > >
> > > There is no way that is a GFN, it is an IOVA.
> > >
> > I see. The name is vague. There is an promised 1:1 mapping between guest
> GFN
> > and host IOVA when a PCI device is passed to a VM, I guess mdev is just
> > leveraging it as they are sharing the same code path in QEMU.
> 
> That has never been true. It happens to be the case in some common
> scenarios.
> 
> > > So if the page table in the guest has IOVA addreses then why can you
> > > use them as GFNs?
> >
> > That's another problem. We don't support a guess enabling the guest
> IOMMU
> > (aka virtual IOMMU). The guest/virtual IOMMU is implemented in QEMU,
> so
> > does the translation between guest IOVA and GFN. For a mdev model
> > implemented in the kernel, there isn't any mechanism so far to reach there.
> 
> And this is the uncommon scenario, there is no way for the mdev driver
> to know if viommu is turned on, and AFAIK, no way to block it from VFIO.
> 
> > People were discussing it before. But none agreement was achieved. Is it
> > possible to implement it in the kernel? Would like to discuss more about it
> > if there is any good idea.
> 
> I don't know of anything, VFIO and kvm are not intended to be tightly
> linked like this, they don't have the same view of the world.
> 

Yes this is the main problem. VFIO only cares about IOVA and KVM
only cares about GPA. GVT as a mdev driver should follow VFIO
in concept but due to the requirement of gpu page table shadowing
it needs call into KVM for write-protecting CPU access to GPA.

What about extending KVM page tracking interface to accept HVA?
This is probably the only common denominator between VFIO and
KVM to allow dissolve this conceptual disconnection...

Thanks
Kevin
Tian, Kevin April 14, 2022, 2:15 a.m. UTC | #12
> From: Wang, Zhi A <zhi.a.wang@intel.com>
> Sent: Thursday, April 14, 2022 5:09 AM
> 
> > Or is it that only the page table levels themselves are GFNs and the
> > actual DMA's are IOVA? The unclear mixing of GFN as IOVA in the code
> > makes it inscrutible.
> >
> 
> No. Even the HW is capable of controlling the level of translation, but
> it's not used like this in the existing driver. It's definitely an
> architecture open.
> 

There is no open on this. Any guest memory that vGPU accesses must
be IOVA including page table levels. There is only one address space
per vRID.
Eric Farman April 14, 2022, 7:25 p.m. UTC | #13
On Tue, 2022-04-12 at 12:53 -0300, Jason Gunthorpe wrote:
> All callers have a struct vfio_device trivially available, pass it in
> directly and avoid calling the expensive vfio_group_get_from_dev().
> 
> To support the unconverted kvmgt mdev driver add
> mdev_legacy_get_vfio_device() which will return the vfio_device
> pointer
> vfio_mdev.c puts in the drv_data.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/gpu/drm/i915/gvt/kvmgt.c  | 15 +++++++++------
>  drivers/s390/cio/vfio_ccw_ops.c   |  7 +++----
>  drivers/s390/crypto/vfio_ap_ops.c | 14 +++++++-------
>  drivers/vfio/mdev/vfio_mdev.c     | 12 ++++++++++++
>  drivers/vfio/vfio.c               | 25 +++++++------------------
>  include/linux/mdev.h              |  1 +
>  include/linux/vfio.h              |  4 ++--
>  7 files changed, 41 insertions(+), 37 deletions(-)

For the -ccw bits:

Acked-by: Eric Farman <farman@linux.ibm.com>

> 
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c
> b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 057ec449010458..bb59d21cf898ab 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -904,6 +904,7 @@ static int intel_vgpu_group_notifier(struct
> notifier_block *nb,
>  
>  static int intel_vgpu_open_device(struct mdev_device *mdev)
>  {
> +	struct vfio_device *vfio_dev =
> mdev_legacy_get_vfio_device(mdev);
>  	struct intel_vgpu *vgpu = mdev_get_drvdata(mdev);
>  	struct kvmgt_vdev *vdev = kvmgt_vdev(vgpu);
>  	unsigned long events;
> @@ -914,7 +915,7 @@ static int intel_vgpu_open_device(struct
> mdev_device *mdev)
>  	vdev->group_notifier.notifier_call = intel_vgpu_group_notifier;
>  
>  	events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
> -	ret = vfio_register_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> &events,
> +	ret = vfio_register_notifier(vfio_dev, VFIO_IOMMU_NOTIFY,
> &events,
>  				&vdev->iommu_notifier);
>  	if (ret != 0) {
>  		gvt_vgpu_err("vfio_register_notifier for iommu failed:
> %d\n",
> @@ -923,7 +924,7 @@ static int intel_vgpu_open_device(struct
> mdev_device *mdev)
>  	}
>  
>  	events = VFIO_GROUP_NOTIFY_SET_KVM;
> -	ret = vfio_register_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
> &events,
> +	ret = vfio_register_notifier(vfio_dev, VFIO_GROUP_NOTIFY,
> &events,
>  				&vdev->group_notifier);
>  	if (ret != 0) {
>  		gvt_vgpu_err("vfio_register_notifier for group failed:
> %d\n",
> @@ -961,11 +962,11 @@ static int intel_vgpu_open_device(struct
> mdev_device *mdev)
>  	vdev->vfio_group = NULL;
>  
>  undo_register:
> -	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
> +	vfio_unregister_notifier(vfio_dev, VFIO_GROUP_NOTIFY,
>  					&vdev->group_notifier);
>  
>  undo_iommu:
> -	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> +	vfio_unregister_notifier(vfio_dev, VFIO_IOMMU_NOTIFY,
>  					&vdev->iommu_notifier);
>  out:
>  	return ret;
> @@ -988,6 +989,7 @@ static void __intel_vgpu_release(struct
> intel_vgpu *vgpu)
>  	struct kvmgt_vdev *vdev = kvmgt_vdev(vgpu);
>  	struct drm_i915_private *i915 = vgpu->gvt->gt->i915;
>  	struct kvmgt_guest_info *info;
> +	struct vfio_device *vfio_dev;
>  	int ret;
>  
>  	if (!handle_valid(vgpu->handle))
> @@ -998,12 +1000,13 @@ static void __intel_vgpu_release(struct
> intel_vgpu *vgpu)
>  
>  	intel_gvt_ops->vgpu_release(vgpu);
>  
> -	ret = vfio_unregister_notifier(mdev_dev(vdev->mdev),
> VFIO_IOMMU_NOTIFY,
> +	vfio_dev = mdev_legacy_get_vfio_device(vdev->mdev);
> +	ret = vfio_unregister_notifier(vfio_dev, VFIO_IOMMU_NOTIFY,
>  					&vdev->iommu_notifier);
>  	drm_WARN(&i915->drm, ret,
>  		 "vfio_unregister_notifier for iommu failed: %d\n",
> ret);
>  
> -	ret = vfio_unregister_notifier(mdev_dev(vdev->mdev),
> VFIO_GROUP_NOTIFY,
> +	ret = vfio_unregister_notifier(vfio_dev, VFIO_GROUP_NOTIFY,
>  					&vdev->group_notifier);
>  	drm_WARN(&i915->drm, ret,
>  		 "vfio_unregister_notifier for group failed: %d\n",
> ret);
> diff --git a/drivers/s390/cio/vfio_ccw_ops.c
> b/drivers/s390/cio/vfio_ccw_ops.c
> index d8589afac272f1..e1ce24d8fb2555 100644
> --- a/drivers/s390/cio/vfio_ccw_ops.c
> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> @@ -183,7 +183,7 @@ static int vfio_ccw_mdev_open_device(struct
> vfio_device *vdev)
>  
>  	private->nb.notifier_call = vfio_ccw_mdev_notifier;
>  
> -	ret = vfio_register_notifier(vdev->dev, VFIO_IOMMU_NOTIFY,
> +	ret = vfio_register_notifier(vdev, VFIO_IOMMU_NOTIFY,
>  				     &events, &private->nb);
>  	if (ret)
>  		return ret;
> @@ -204,8 +204,7 @@ static int vfio_ccw_mdev_open_device(struct
> vfio_device *vdev)
>  
>  out_unregister:
>  	vfio_ccw_unregister_dev_regions(private);
> -	vfio_unregister_notifier(vdev->dev, VFIO_IOMMU_NOTIFY,
> -				 &private->nb);
> +	vfio_unregister_notifier(vdev, VFIO_IOMMU_NOTIFY, &private-
> >nb);
>  	return ret;
>  }
>  
> @@ -223,7 +222,7 @@ static void vfio_ccw_mdev_close_device(struct
> vfio_device *vdev)
>  
>  	cp_free(&private->cp);
>  	vfio_ccw_unregister_dev_regions(private);
> -	vfio_unregister_notifier(vdev->dev, VFIO_IOMMU_NOTIFY,
> &private->nb);
> +	vfio_unregister_notifier(vdev, VFIO_IOMMU_NOTIFY, &private-
> >nb);
>  }
>  
>  static ssize_t vfio_ccw_mdev_read_io_region(struct vfio_ccw_private
> *private,
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
> b/drivers/s390/crypto/vfio_ap_ops.c
> index 6e08d04b605d6e..69768061cd7bd9 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -1406,21 +1406,21 @@ static int vfio_ap_mdev_open_device(struct
> vfio_device *vdev)
>  	matrix_mdev->group_notifier.notifier_call =
> vfio_ap_mdev_group_notifier;
>  	events = VFIO_GROUP_NOTIFY_SET_KVM;
>  
> -	ret = vfio_register_notifier(vdev->dev, VFIO_GROUP_NOTIFY,
> -				     &events, &matrix_mdev-
> >group_notifier);
> +	ret = vfio_register_notifier(vdev, VFIO_GROUP_NOTIFY, &events,
> +				     &matrix_mdev->group_notifier);
>  	if (ret)
>  		return ret;
>  
>  	matrix_mdev->iommu_notifier.notifier_call =
> vfio_ap_mdev_iommu_notifier;
>  	events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
> -	ret = vfio_register_notifier(vdev->dev, VFIO_IOMMU_NOTIFY,
> -				     &events, &matrix_mdev-
> >iommu_notifier);
> +	ret = vfio_register_notifier(vdev, VFIO_IOMMU_NOTIFY, &events,
> +				     &matrix_mdev->iommu_notifier);
>  	if (ret)
>  		goto out_unregister_group;
>  	return 0;
>  
>  out_unregister_group:
> -	vfio_unregister_notifier(vdev->dev, VFIO_GROUP_NOTIFY,
> +	vfio_unregister_notifier(vdev, VFIO_GROUP_NOTIFY,
>  				 &matrix_mdev->group_notifier);
>  	return ret;
>  }
> @@ -1430,9 +1430,9 @@ static void vfio_ap_mdev_close_device(struct
> vfio_device *vdev)
>  	struct ap_matrix_mdev *matrix_mdev =
>  		container_of(vdev, struct ap_matrix_mdev, vdev);
>  
> -	vfio_unregister_notifier(vdev->dev, VFIO_IOMMU_NOTIFY,
> +	vfio_unregister_notifier(vdev, VFIO_IOMMU_NOTIFY,
>  				 &matrix_mdev->iommu_notifier);
> -	vfio_unregister_notifier(vdev->dev, VFIO_GROUP_NOTIFY,
> +	vfio_unregister_notifier(vdev, VFIO_GROUP_NOTIFY,
>  				 &matrix_mdev->group_notifier);
>  	vfio_ap_mdev_unset_kvm(matrix_mdev);
>  }
> diff --git a/drivers/vfio/mdev/vfio_mdev.c
> b/drivers/vfio/mdev/vfio_mdev.c
> index a90e24b0c851d3..91605c1e8c8f94 100644
> --- a/drivers/vfio/mdev/vfio_mdev.c
> +++ b/drivers/vfio/mdev/vfio_mdev.c
> @@ -17,6 +17,18 @@
>  
>  #include "mdev_private.h"
>  
> +/*
> + * Return the struct vfio_device for the mdev when using the legacy
> + * vfio_mdev_dev_ops path. No new callers to this function should be
> added.
> + */
> +struct vfio_device *mdev_legacy_get_vfio_device(struct mdev_device
> *mdev)
> +{
> +	if (WARN_ON(mdev->dev.driver != &vfio_mdev_driver.driver))
> +		return NULL;
> +	return dev_get_drvdata(&mdev->dev);
> +}
> +EXPORT_SYMBOL_GPL(mdev_legacy_get_vfio_device);
> +
>  static int vfio_mdev_open_device(struct vfio_device *core_vdev)
>  {
>  	struct mdev_device *mdev = to_mdev_device(core_vdev->dev);
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index a4555014bd1e72..8a5c46aa2bef61 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -2484,19 +2484,15 @@ static int
> vfio_unregister_group_notifier(struct vfio_group *group,
>  	return ret;
>  }
>  
> -int vfio_register_notifier(struct device *dev, enum vfio_notify_type
> type,
> +int vfio_register_notifier(struct vfio_device *dev, enum
> vfio_notify_type type,
>  			   unsigned long *events, struct notifier_block
> *nb)
>  {
> -	struct vfio_group *group;
> +	struct vfio_group *group = dev->group;
>  	int ret;
>  
> -	if (!dev || !nb || !events || (*events == 0))
> +	if (!nb || !events || (*events == 0))
>  		return -EINVAL;
>  
> -	group = vfio_group_get_from_dev(dev);
> -	if (!group)
> -		return -ENODEV;
> -
>  	switch (type) {
>  	case VFIO_IOMMU_NOTIFY:
>  		ret = vfio_register_iommu_notifier(group, events, nb);
> @@ -2507,25 +2503,20 @@ int vfio_register_notifier(struct device
> *dev, enum vfio_notify_type type,
>  	default:
>  		ret = -EINVAL;
>  	}
> -
> -	vfio_group_put(group);
>  	return ret;
>  }
>  EXPORT_SYMBOL(vfio_register_notifier);
>  
> -int vfio_unregister_notifier(struct device *dev, enum
> vfio_notify_type type,
> +int vfio_unregister_notifier(struct vfio_device *dev,
> +			     enum vfio_notify_type type,
>  			     struct notifier_block *nb)
>  {
> -	struct vfio_group *group;
> +	struct vfio_group *group = dev->group;
>  	int ret;
>  
> -	if (!dev || !nb)
> +	if (!nb)
>  		return -EINVAL;
>  
> -	group = vfio_group_get_from_dev(dev);
> -	if (!group)
> -		return -ENODEV;
> -
>  	switch (type) {
>  	case VFIO_IOMMU_NOTIFY:
>  		ret = vfio_unregister_iommu_notifier(group, nb);
> @@ -2536,8 +2527,6 @@ int vfio_unregister_notifier(struct device
> *dev, enum vfio_notify_type type,
>  	default:
>  		ret = -EINVAL;
>  	}
> -
> -	vfio_group_put(group);
>  	return ret;
>  }
>  EXPORT_SYMBOL(vfio_unregister_notifier);
> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
> index 15d03f6532d073..67d07220a28f29 100644
> --- a/include/linux/mdev.h
> +++ b/include/linux/mdev.h
> @@ -29,6 +29,7 @@ static inline struct mdev_device
> *to_mdev_device(struct device *dev)
>  unsigned int mdev_get_type_group_id(struct mdev_device *mdev);
>  unsigned int mtype_get_type_group_id(struct mdev_type *mtype);
>  struct device *mtype_get_parent_dev(struct mdev_type *mtype);
> +struct vfio_device *mdev_legacy_get_vfio_device(struct mdev_device
> *mdev);
>  
>  /**
>   * struct mdev_parent_ops - Structure to be registered for each
> parent device to
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 66dda06ec42d1b..748ec0e0293aea 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -178,11 +178,11 @@ enum vfio_notify_type {
>  /* events for VFIO_GROUP_NOTIFY */
>  #define VFIO_GROUP_NOTIFY_SET_KVM	BIT(0)
>  
> -extern int vfio_register_notifier(struct device *dev,
> +extern int vfio_register_notifier(struct vfio_device *dev,
>  				  enum vfio_notify_type type,
>  				  unsigned long *required_events,
>  				  struct notifier_block *nb);
> -extern int vfio_unregister_notifier(struct device *dev,
> +extern int vfio_unregister_notifier(struct vfio_device *dev,
>  				    enum vfio_notify_type type,
>  				    struct notifier_block *nb);
>
Anthony Krowiak April 18, 2022, 3:28 p.m. UTC | #14
On 4/12/22 11:53 AM, Jason Gunthorpe wrote:
> All callers have a struct vfio_device trivially available, pass it in
> directly and avoid calling the expensive vfio_group_get_from_dev().
>
> To support the unconverted kvmgt mdev driver add
> mdev_legacy_get_vfio_device() which will return the vfio_device pointer
> vfio_mdev.c puts in the drv_data.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>   drivers/gpu/drm/i915/gvt/kvmgt.c  | 15 +++++++++------
>   drivers/s390/cio/vfio_ccw_ops.c   |  7 +++----
>   drivers/s390/crypto/vfio_ap_ops.c | 14 +++++++-------
>   drivers/vfio/mdev/vfio_mdev.c     | 12 ++++++++++++
>   drivers/vfio/vfio.c               | 25 +++++++------------------
>   include/linux/mdev.h              |  1 +
>   include/linux/vfio.h              |  4 ++--
>   7 files changed, 41 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 057ec449010458..bb59d21cf898ab 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -904,6 +904,7 @@ static int intel_vgpu_group_notifier(struct notifier_block *nb,
>   
>   static int intel_vgpu_open_device(struct mdev_device *mdev)
>   {
> +	struct vfio_device *vfio_dev = mdev_legacy_get_vfio_device(mdev);
>   	struct intel_vgpu *vgpu = mdev_get_drvdata(mdev);
>   	struct kvmgt_vdev *vdev = kvmgt_vdev(vgpu);
>   	unsigned long events;
> @@ -914,7 +915,7 @@ static int intel_vgpu_open_device(struct mdev_device *mdev)
>   	vdev->group_notifier.notifier_call = intel_vgpu_group_notifier;
>   
>   	events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
> -	ret = vfio_register_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY, &events,
> +	ret = vfio_register_notifier(vfio_dev, VFIO_IOMMU_NOTIFY, &events,
>   				&vdev->iommu_notifier);
>   	if (ret != 0) {
>   		gvt_vgpu_err("vfio_register_notifier for iommu failed: %d\n",
> @@ -923,7 +924,7 @@ static int intel_vgpu_open_device(struct mdev_device *mdev)
>   	}
>   
>   	events = VFIO_GROUP_NOTIFY_SET_KVM;
> -	ret = vfio_register_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY, &events,
> +	ret = vfio_register_notifier(vfio_dev, VFIO_GROUP_NOTIFY, &events,
>   				&vdev->group_notifier);
>   	if (ret != 0) {
>   		gvt_vgpu_err("vfio_register_notifier for group failed: %d\n",
> @@ -961,11 +962,11 @@ static int intel_vgpu_open_device(struct mdev_device *mdev)
>   	vdev->vfio_group = NULL;
>   
>   undo_register:
> -	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
> +	vfio_unregister_notifier(vfio_dev, VFIO_GROUP_NOTIFY,
>   					&vdev->group_notifier);
>   
>   undo_iommu:
> -	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> +	vfio_unregister_notifier(vfio_dev, VFIO_IOMMU_NOTIFY,
>   					&vdev->iommu_notifier);
>   out:
>   	return ret;
> @@ -988,6 +989,7 @@ static void __intel_vgpu_release(struct intel_vgpu *vgpu)
>   	struct kvmgt_vdev *vdev = kvmgt_vdev(vgpu);
>   	struct drm_i915_private *i915 = vgpu->gvt->gt->i915;
>   	struct kvmgt_guest_info *info;
> +	struct vfio_device *vfio_dev;
>   	int ret;
>   
>   	if (!handle_valid(vgpu->handle))
> @@ -998,12 +1000,13 @@ static void __intel_vgpu_release(struct intel_vgpu *vgpu)
>   
>   	intel_gvt_ops->vgpu_release(vgpu);
>   
> -	ret = vfio_unregister_notifier(mdev_dev(vdev->mdev), VFIO_IOMMU_NOTIFY,
> +	vfio_dev = mdev_legacy_get_vfio_device(vdev->mdev);
> +	ret = vfio_unregister_notifier(vfio_dev, VFIO_IOMMU_NOTIFY,
>   					&vdev->iommu_notifier);
>   	drm_WARN(&i915->drm, ret,
>   		 "vfio_unregister_notifier for iommu failed: %d\n", ret);
>   
> -	ret = vfio_unregister_notifier(mdev_dev(vdev->mdev), VFIO_GROUP_NOTIFY,
> +	ret = vfio_unregister_notifier(vfio_dev, VFIO_GROUP_NOTIFY,
>   					&vdev->group_notifier);
>   	drm_WARN(&i915->drm, ret,
>   		 "vfio_unregister_notifier for group failed: %d\n", ret);
> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> index d8589afac272f1..e1ce24d8fb2555 100644
> --- a/drivers/s390/cio/vfio_ccw_ops.c
> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> @@ -183,7 +183,7 @@ static int vfio_ccw_mdev_open_device(struct vfio_device *vdev)
>   
>   	private->nb.notifier_call = vfio_ccw_mdev_notifier;
>   
> -	ret = vfio_register_notifier(vdev->dev, VFIO_IOMMU_NOTIFY,
> +	ret = vfio_register_notifier(vdev, VFIO_IOMMU_NOTIFY,
>   				     &events, &private->nb);
>   	if (ret)
>   		return ret;
> @@ -204,8 +204,7 @@ static int vfio_ccw_mdev_open_device(struct vfio_device *vdev)
>   
>   out_unregister:
>   	vfio_ccw_unregister_dev_regions(private);
> -	vfio_unregister_notifier(vdev->dev, VFIO_IOMMU_NOTIFY,
> -				 &private->nb);
> +	vfio_unregister_notifier(vdev, VFIO_IOMMU_NOTIFY, &private->nb);
>   	return ret;
>   }
>   
> @@ -223,7 +222,7 @@ static void vfio_ccw_mdev_close_device(struct vfio_device *vdev)
>   
>   	cp_free(&private->cp);
>   	vfio_ccw_unregister_dev_regions(private);
> -	vfio_unregister_notifier(vdev->dev, VFIO_IOMMU_NOTIFY, &private->nb);
> +	vfio_unregister_notifier(vdev, VFIO_IOMMU_NOTIFY, &private->nb);
>   }
>   
>   static ssize_t vfio_ccw_mdev_read_io_region(struct vfio_ccw_private *private,
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 6e08d04b605d6e..69768061cd7bd9 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -1406,21 +1406,21 @@ static int vfio_ap_mdev_open_device(struct vfio_device *vdev)
>   	matrix_mdev->group_notifier.notifier_call = vfio_ap_mdev_group_notifier;
>   	events = VFIO_GROUP_NOTIFY_SET_KVM;
>   
> -	ret = vfio_register_notifier(vdev->dev, VFIO_GROUP_NOTIFY,
> -				     &events, &matrix_mdev->group_notifier);
> +	ret = vfio_register_notifier(vdev, VFIO_GROUP_NOTIFY, &events,
> +				     &matrix_mdev->group_notifier);
>   	if (ret)
>   		return ret;
>   
>   	matrix_mdev->iommu_notifier.notifier_call = vfio_ap_mdev_iommu_notifier;
>   	events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
> -	ret = vfio_register_notifier(vdev->dev, VFIO_IOMMU_NOTIFY,
> -				     &events, &matrix_mdev->iommu_notifier);
> +	ret = vfio_register_notifier(vdev, VFIO_IOMMU_NOTIFY, &events,
> +				     &matrix_mdev->iommu_notifier);
>   	if (ret)
>   		goto out_unregister_group;
>   	return 0;
>   
>   out_unregister_group:
> -	vfio_unregister_notifier(vdev->dev, VFIO_GROUP_NOTIFY,
> +	vfio_unregister_notifier(vdev, VFIO_GROUP_NOTIFY,
>   				 &matrix_mdev->group_notifier);
>   	return ret;
>   }
> @@ -1430,9 +1430,9 @@ static void vfio_ap_mdev_close_device(struct vfio_device *vdev)
>   	struct ap_matrix_mdev *matrix_mdev =
>   		container_of(vdev, struct ap_matrix_mdev, vdev);
>   
> -	vfio_unregister_notifier(vdev->dev, VFIO_IOMMU_NOTIFY,
> +	vfio_unregister_notifier(vdev, VFIO_IOMMU_NOTIFY,
>   				 &matrix_mdev->iommu_notifier);
> -	vfio_unregister_notifier(vdev->dev, VFIO_GROUP_NOTIFY,
> +	vfio_unregister_notifier(vdev, VFIO_GROUP_NOTIFY,
>   				 &matrix_mdev->group_notifier);
>   	vfio_ap_mdev_unset_kvm(matrix_mdev);
>   }
> diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
> index a90e24b0c851d3..91605c1e8c8f94 100644
> --- a/drivers/vfio/mdev/vfio_mdev.c
> +++ b/drivers/vfio/mdev/vfio_mdev.c
> @@ -17,6 +17,18 @@
>   
>   #include "mdev_private.h"
>   
> +/*
> + * Return the struct vfio_device for the mdev when using the legacy
> + * vfio_mdev_dev_ops path. No new callers to this function should be added.
> + */
> +struct vfio_device *mdev_legacy_get_vfio_device(struct mdev_device *mdev)
> +{
> +	if (WARN_ON(mdev->dev.driver != &vfio_mdev_driver.driver))
> +		return NULL;
> +	return dev_get_drvdata(&mdev->dev);
> +}
> +EXPORT_SYMBOL_GPL(mdev_legacy_get_vfio_device);
> +
>   static int vfio_mdev_open_device(struct vfio_device *core_vdev)
>   {
>   	struct mdev_device *mdev = to_mdev_device(core_vdev->dev);
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index a4555014bd1e72..8a5c46aa2bef61 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -2484,19 +2484,15 @@ static int vfio_unregister_group_notifier(struct vfio_group *group,
>   	return ret;
>   }
>   
> -int vfio_register_notifier(struct device *dev, enum vfio_notify_type type,
> +int vfio_register_notifier(struct vfio_device *dev, enum vfio_notify_type type,
>   			   unsigned long *events, struct notifier_block *nb)
>   {
> -	struct vfio_group *group;
> +	struct vfio_group *group = dev->group;

Is there a guarantee that dev != NULL? The original code below checks
the value of dev, so why is that check eliminated here?

>   	int ret;
>   
> -	if (!dev || !nb || !events || (*events == 0))
> +	if (!nb || !events || (*events == 0))
>   		return -EINVAL;
>   
> -	group = vfio_group_get_from_dev(dev);
> -	if (!group)
> -		return -ENODEV;
> -
>   	switch (type) {
>   	case VFIO_IOMMU_NOTIFY:
>   		ret = vfio_register_iommu_notifier(group, events, nb);
> @@ -2507,25 +2503,20 @@ int vfio_register_notifier(struct device *dev, enum vfio_notify_type type,
>   	default:
>   		ret = -EINVAL;
>   	}
> -
> -	vfio_group_put(group);
>   	return ret;
>   }
>   EXPORT_SYMBOL(vfio_register_notifier);
>   
> -int vfio_unregister_notifier(struct device *dev, enum vfio_notify_type type,
> +int vfio_unregister_notifier(struct vfio_device *dev,
> +			     enum vfio_notify_type type,
>   			     struct notifier_block *nb)
>   {
> -	struct vfio_group *group;
> +	struct vfio_group *group = dev->group;

Same comment as above, not NULL check here.

>   	int ret;
>   
> -	if (!dev || !nb)
> +	if (!nb)
>   		return -EINVAL;
>   
> -	group = vfio_group_get_from_dev(dev);
> -	if (!group)
> -		return -ENODEV;
> -
>   	switch (type) {
>   	case VFIO_IOMMU_NOTIFY:
>   		ret = vfio_unregister_iommu_notifier(group, nb);
> @@ -2536,8 +2527,6 @@ int vfio_unregister_notifier(struct device *dev, enum vfio_notify_type type,
>   	default:
>   		ret = -EINVAL;
>   	}
> -
> -	vfio_group_put(group);
>   	return ret;
>   }
>   EXPORT_SYMBOL(vfio_unregister_notifier);
> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
> index 15d03f6532d073..67d07220a28f29 100644
> --- a/include/linux/mdev.h
> +++ b/include/linux/mdev.h
> @@ -29,6 +29,7 @@ static inline struct mdev_device *to_mdev_device(struct device *dev)
>   unsigned int mdev_get_type_group_id(struct mdev_device *mdev);
>   unsigned int mtype_get_type_group_id(struct mdev_type *mtype);
>   struct device *mtype_get_parent_dev(struct mdev_type *mtype);
> +struct vfio_device *mdev_legacy_get_vfio_device(struct mdev_device *mdev);
>   
>   /**
>    * struct mdev_parent_ops - Structure to be registered for each parent device to
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 66dda06ec42d1b..748ec0e0293aea 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -178,11 +178,11 @@ enum vfio_notify_type {
>   /* events for VFIO_GROUP_NOTIFY */
>   #define VFIO_GROUP_NOTIFY_SET_KVM	BIT(0)
>   
> -extern int vfio_register_notifier(struct device *dev,
> +extern int vfio_register_notifier(struct vfio_device *dev,
>   				  enum vfio_notify_type type,
>   				  unsigned long *required_events,
>   				  struct notifier_block *nb);
> -extern int vfio_unregister_notifier(struct device *dev,
> +extern int vfio_unregister_notifier(struct vfio_device *dev,
>   				    enum vfio_notify_type type,
>   				    struct notifier_block *nb);
>
Jason J. Herne April 18, 2022, 3:29 p.m. UTC | #15
On 4/12/22 11:53, Jason Gunthorpe wrote:
> All callers have a struct vfio_device trivially available, pass it in
> directly and avoid calling the expensive vfio_group_get_from_dev().
> 
> To support the unconverted kvmgt mdev driver add
> mdev_legacy_get_vfio_device() which will return the vfio_device pointer
> vfio_mdev.c puts in the drv_data.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ...
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 6e08d04b605d6e..69768061cd7bd9 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -1406,21 +1406,21 @@ static int vfio_ap_mdev_open_device(struct vfio_device *vdev)
>   	matrix_mdev->group_notifier.notifier_call = vfio_ap_mdev_group_notifier;
>   	events = VFIO_GROUP_NOTIFY_SET_KVM;
>   
> -	ret = vfio_register_notifier(vdev->dev, VFIO_GROUP_NOTIFY,
> -				     &events, &matrix_mdev->group_notifier);
> +	ret = vfio_register_notifier(vdev, VFIO_GROUP_NOTIFY, &events,
> +				     &matrix_mdev->group_notifier);
>   	if (ret)
>   		return ret;
>   
>   	matrix_mdev->iommu_notifier.notifier_call = vfio_ap_mdev_iommu_notifier;
>   	events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
> -	ret = vfio_register_notifier(vdev->dev, VFIO_IOMMU_NOTIFY,
> -				     &events, &matrix_mdev->iommu_notifier);
> +	ret = vfio_register_notifier(vdev, VFIO_IOMMU_NOTIFY, &events,
> +				     &matrix_mdev->iommu_notifier);
>   	if (ret)
>   		goto out_unregister_group;
>   	return 0;
>   
>   out_unregister_group:
> -	vfio_unregister_notifier(vdev->dev, VFIO_GROUP_NOTIFY,
> +	vfio_unregister_notifier(vdev, VFIO_GROUP_NOTIFY,
>   				 &matrix_mdev->group_notifier);
>   	return ret;
>   }
> @@ -1430,9 +1430,9 @@ static void vfio_ap_mdev_close_device(struct vfio_device *vdev)
>   	struct ap_matrix_mdev *matrix_mdev =
>   		container_of(vdev, struct ap_matrix_mdev, vdev);
>   
> -	vfio_unregister_notifier(vdev->dev, VFIO_IOMMU_NOTIFY,
> +	vfio_unregister_notifier(vdev, VFIO_IOMMU_NOTIFY,
>   				 &matrix_mdev->iommu_notifier);
> -	vfio_unregister_notifier(vdev->dev, VFIO_GROUP_NOTIFY,
> +	vfio_unregister_notifier(vdev, VFIO_GROUP_NOTIFY,
>   				 &matrix_mdev->group_notifier);
>   	vfio_ap_mdev_unset_kvm(matrix_mdev);
>   }
looks good for vfio_ap.
Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com>
Jason Gunthorpe April 18, 2022, 3:44 p.m. UTC | #16
On Mon, Apr 18, 2022 at 11:28:30AM -0400, Tony Krowiak wrote:
> > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > index a4555014bd1e72..8a5c46aa2bef61 100644
> > +++ b/drivers/vfio/vfio.c
> > @@ -2484,19 +2484,15 @@ static int vfio_unregister_group_notifier(struct vfio_group *group,
> >   	return ret;
> >   }
> > -int vfio_register_notifier(struct device *dev, enum vfio_notify_type type,
> > +int vfio_register_notifier(struct vfio_device *dev, enum vfio_notify_type type,
> >   			   unsigned long *events, struct notifier_block *nb)
> >   {
> > -	struct vfio_group *group;
> > +	struct vfio_group *group = dev->group;
> 
> Is there a guarantee that dev != NULL? The original code below checks
> the value of dev, so why is that check eliminated here?

Yes, no kernel driver calls this with null dev. The original code
should have been a WARN_ON as it is just protecting against a buggy
driver. In this case if the driver is buggy we simply generate a
backtrace through a null deref panic.

Jason
Anthony Krowiak April 18, 2022, 3:52 p.m. UTC | #17
On 4/18/22 11:44 AM, Jason Gunthorpe wrote:
> On Mon, Apr 18, 2022 at 11:28:30AM -0400, Tony Krowiak wrote:
>>> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
>>> index a4555014bd1e72..8a5c46aa2bef61 100644
>>> +++ b/drivers/vfio/vfio.c
>>> @@ -2484,19 +2484,15 @@ static int vfio_unregister_group_notifier(struct vfio_group *group,
>>>    	return ret;
>>>    }
>>> -int vfio_register_notifier(struct device *dev, enum vfio_notify_type type,
>>> +int vfio_register_notifier(struct vfio_device *dev, enum vfio_notify_type type,
>>>    			   unsigned long *events, struct notifier_block *nb)
>>>    {
>>> -	struct vfio_group *group;
>>> +	struct vfio_group *group = dev->group;
>> Is there a guarantee that dev != NULL? The original code below checks
>> the value of dev, so why is that check eliminated here?
> Yes, no kernel driver calls this with null dev. The original code
> should have been a WARN_ON as it is just protecting against a buggy
> driver. In this case if the driver is buggy we simply generate a
> backtrace through a null deref panic.
>
> Jason

Regarding the vfio_ap parts:
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 057ec449010458..bb59d21cf898ab 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -904,6 +904,7 @@  static int intel_vgpu_group_notifier(struct notifier_block *nb,
 
 static int intel_vgpu_open_device(struct mdev_device *mdev)
 {
+	struct vfio_device *vfio_dev = mdev_legacy_get_vfio_device(mdev);
 	struct intel_vgpu *vgpu = mdev_get_drvdata(mdev);
 	struct kvmgt_vdev *vdev = kvmgt_vdev(vgpu);
 	unsigned long events;
@@ -914,7 +915,7 @@  static int intel_vgpu_open_device(struct mdev_device *mdev)
 	vdev->group_notifier.notifier_call = intel_vgpu_group_notifier;
 
 	events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
-	ret = vfio_register_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY, &events,
+	ret = vfio_register_notifier(vfio_dev, VFIO_IOMMU_NOTIFY, &events,
 				&vdev->iommu_notifier);
 	if (ret != 0) {
 		gvt_vgpu_err("vfio_register_notifier for iommu failed: %d\n",
@@ -923,7 +924,7 @@  static int intel_vgpu_open_device(struct mdev_device *mdev)
 	}
 
 	events = VFIO_GROUP_NOTIFY_SET_KVM;
-	ret = vfio_register_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY, &events,
+	ret = vfio_register_notifier(vfio_dev, VFIO_GROUP_NOTIFY, &events,
 				&vdev->group_notifier);
 	if (ret != 0) {
 		gvt_vgpu_err("vfio_register_notifier for group failed: %d\n",
@@ -961,11 +962,11 @@  static int intel_vgpu_open_device(struct mdev_device *mdev)
 	vdev->vfio_group = NULL;
 
 undo_register:
-	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
+	vfio_unregister_notifier(vfio_dev, VFIO_GROUP_NOTIFY,
 					&vdev->group_notifier);
 
 undo_iommu:
-	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
+	vfio_unregister_notifier(vfio_dev, VFIO_IOMMU_NOTIFY,
 					&vdev->iommu_notifier);
 out:
 	return ret;
@@ -988,6 +989,7 @@  static void __intel_vgpu_release(struct intel_vgpu *vgpu)
 	struct kvmgt_vdev *vdev = kvmgt_vdev(vgpu);
 	struct drm_i915_private *i915 = vgpu->gvt->gt->i915;
 	struct kvmgt_guest_info *info;
+	struct vfio_device *vfio_dev;
 	int ret;
 
 	if (!handle_valid(vgpu->handle))
@@ -998,12 +1000,13 @@  static void __intel_vgpu_release(struct intel_vgpu *vgpu)
 
 	intel_gvt_ops->vgpu_release(vgpu);
 
-	ret = vfio_unregister_notifier(mdev_dev(vdev->mdev), VFIO_IOMMU_NOTIFY,
+	vfio_dev = mdev_legacy_get_vfio_device(vdev->mdev);
+	ret = vfio_unregister_notifier(vfio_dev, VFIO_IOMMU_NOTIFY,
 					&vdev->iommu_notifier);
 	drm_WARN(&i915->drm, ret,
 		 "vfio_unregister_notifier for iommu failed: %d\n", ret);
 
-	ret = vfio_unregister_notifier(mdev_dev(vdev->mdev), VFIO_GROUP_NOTIFY,
+	ret = vfio_unregister_notifier(vfio_dev, VFIO_GROUP_NOTIFY,
 					&vdev->group_notifier);
 	drm_WARN(&i915->drm, ret,
 		 "vfio_unregister_notifier for group failed: %d\n", ret);
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index d8589afac272f1..e1ce24d8fb2555 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -183,7 +183,7 @@  static int vfio_ccw_mdev_open_device(struct vfio_device *vdev)
 
 	private->nb.notifier_call = vfio_ccw_mdev_notifier;
 
-	ret = vfio_register_notifier(vdev->dev, VFIO_IOMMU_NOTIFY,
+	ret = vfio_register_notifier(vdev, VFIO_IOMMU_NOTIFY,
 				     &events, &private->nb);
 	if (ret)
 		return ret;
@@ -204,8 +204,7 @@  static int vfio_ccw_mdev_open_device(struct vfio_device *vdev)
 
 out_unregister:
 	vfio_ccw_unregister_dev_regions(private);
-	vfio_unregister_notifier(vdev->dev, VFIO_IOMMU_NOTIFY,
-				 &private->nb);
+	vfio_unregister_notifier(vdev, VFIO_IOMMU_NOTIFY, &private->nb);
 	return ret;
 }
 
@@ -223,7 +222,7 @@  static void vfio_ccw_mdev_close_device(struct vfio_device *vdev)
 
 	cp_free(&private->cp);
 	vfio_ccw_unregister_dev_regions(private);
-	vfio_unregister_notifier(vdev->dev, VFIO_IOMMU_NOTIFY, &private->nb);
+	vfio_unregister_notifier(vdev, VFIO_IOMMU_NOTIFY, &private->nb);
 }
 
 static ssize_t vfio_ccw_mdev_read_io_region(struct vfio_ccw_private *private,
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 6e08d04b605d6e..69768061cd7bd9 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1406,21 +1406,21 @@  static int vfio_ap_mdev_open_device(struct vfio_device *vdev)
 	matrix_mdev->group_notifier.notifier_call = vfio_ap_mdev_group_notifier;
 	events = VFIO_GROUP_NOTIFY_SET_KVM;
 
-	ret = vfio_register_notifier(vdev->dev, VFIO_GROUP_NOTIFY,
-				     &events, &matrix_mdev->group_notifier);
+	ret = vfio_register_notifier(vdev, VFIO_GROUP_NOTIFY, &events,
+				     &matrix_mdev->group_notifier);
 	if (ret)
 		return ret;
 
 	matrix_mdev->iommu_notifier.notifier_call = vfio_ap_mdev_iommu_notifier;
 	events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
-	ret = vfio_register_notifier(vdev->dev, VFIO_IOMMU_NOTIFY,
-				     &events, &matrix_mdev->iommu_notifier);
+	ret = vfio_register_notifier(vdev, VFIO_IOMMU_NOTIFY, &events,
+				     &matrix_mdev->iommu_notifier);
 	if (ret)
 		goto out_unregister_group;
 	return 0;
 
 out_unregister_group:
-	vfio_unregister_notifier(vdev->dev, VFIO_GROUP_NOTIFY,
+	vfio_unregister_notifier(vdev, VFIO_GROUP_NOTIFY,
 				 &matrix_mdev->group_notifier);
 	return ret;
 }
@@ -1430,9 +1430,9 @@  static void vfio_ap_mdev_close_device(struct vfio_device *vdev)
 	struct ap_matrix_mdev *matrix_mdev =
 		container_of(vdev, struct ap_matrix_mdev, vdev);
 
-	vfio_unregister_notifier(vdev->dev, VFIO_IOMMU_NOTIFY,
+	vfio_unregister_notifier(vdev, VFIO_IOMMU_NOTIFY,
 				 &matrix_mdev->iommu_notifier);
-	vfio_unregister_notifier(vdev->dev, VFIO_GROUP_NOTIFY,
+	vfio_unregister_notifier(vdev, VFIO_GROUP_NOTIFY,
 				 &matrix_mdev->group_notifier);
 	vfio_ap_mdev_unset_kvm(matrix_mdev);
 }
diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
index a90e24b0c851d3..91605c1e8c8f94 100644
--- a/drivers/vfio/mdev/vfio_mdev.c
+++ b/drivers/vfio/mdev/vfio_mdev.c
@@ -17,6 +17,18 @@ 
 
 #include "mdev_private.h"
 
+/*
+ * Return the struct vfio_device for the mdev when using the legacy
+ * vfio_mdev_dev_ops path. No new callers to this function should be added.
+ */
+struct vfio_device *mdev_legacy_get_vfio_device(struct mdev_device *mdev)
+{
+	if (WARN_ON(mdev->dev.driver != &vfio_mdev_driver.driver))
+		return NULL;
+	return dev_get_drvdata(&mdev->dev);
+}
+EXPORT_SYMBOL_GPL(mdev_legacy_get_vfio_device);
+
 static int vfio_mdev_open_device(struct vfio_device *core_vdev)
 {
 	struct mdev_device *mdev = to_mdev_device(core_vdev->dev);
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index a4555014bd1e72..8a5c46aa2bef61 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -2484,19 +2484,15 @@  static int vfio_unregister_group_notifier(struct vfio_group *group,
 	return ret;
 }
 
-int vfio_register_notifier(struct device *dev, enum vfio_notify_type type,
+int vfio_register_notifier(struct vfio_device *dev, enum vfio_notify_type type,
 			   unsigned long *events, struct notifier_block *nb)
 {
-	struct vfio_group *group;
+	struct vfio_group *group = dev->group;
 	int ret;
 
-	if (!dev || !nb || !events || (*events == 0))
+	if (!nb || !events || (*events == 0))
 		return -EINVAL;
 
-	group = vfio_group_get_from_dev(dev);
-	if (!group)
-		return -ENODEV;
-
 	switch (type) {
 	case VFIO_IOMMU_NOTIFY:
 		ret = vfio_register_iommu_notifier(group, events, nb);
@@ -2507,25 +2503,20 @@  int vfio_register_notifier(struct device *dev, enum vfio_notify_type type,
 	default:
 		ret = -EINVAL;
 	}
-
-	vfio_group_put(group);
 	return ret;
 }
 EXPORT_SYMBOL(vfio_register_notifier);
 
-int vfio_unregister_notifier(struct device *dev, enum vfio_notify_type type,
+int vfio_unregister_notifier(struct vfio_device *dev,
+			     enum vfio_notify_type type,
 			     struct notifier_block *nb)
 {
-	struct vfio_group *group;
+	struct vfio_group *group = dev->group;
 	int ret;
 
-	if (!dev || !nb)
+	if (!nb)
 		return -EINVAL;
 
-	group = vfio_group_get_from_dev(dev);
-	if (!group)
-		return -ENODEV;
-
 	switch (type) {
 	case VFIO_IOMMU_NOTIFY:
 		ret = vfio_unregister_iommu_notifier(group, nb);
@@ -2536,8 +2527,6 @@  int vfio_unregister_notifier(struct device *dev, enum vfio_notify_type type,
 	default:
 		ret = -EINVAL;
 	}
-
-	vfio_group_put(group);
 	return ret;
 }
 EXPORT_SYMBOL(vfio_unregister_notifier);
diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index 15d03f6532d073..67d07220a28f29 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -29,6 +29,7 @@  static inline struct mdev_device *to_mdev_device(struct device *dev)
 unsigned int mdev_get_type_group_id(struct mdev_device *mdev);
 unsigned int mtype_get_type_group_id(struct mdev_type *mtype);
 struct device *mtype_get_parent_dev(struct mdev_type *mtype);
+struct vfio_device *mdev_legacy_get_vfio_device(struct mdev_device *mdev);
 
 /**
  * struct mdev_parent_ops - Structure to be registered for each parent device to
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 66dda06ec42d1b..748ec0e0293aea 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -178,11 +178,11 @@  enum vfio_notify_type {
 /* events for VFIO_GROUP_NOTIFY */
 #define VFIO_GROUP_NOTIFY_SET_KVM	BIT(0)
 
-extern int vfio_register_notifier(struct device *dev,
+extern int vfio_register_notifier(struct vfio_device *dev,
 				  enum vfio_notify_type type,
 				  unsigned long *required_events,
 				  struct notifier_block *nb);
-extern int vfio_unregister_notifier(struct device *dev,
+extern int vfio_unregister_notifier(struct vfio_device *dev,
 				    enum vfio_notify_type type,
 				    struct notifier_block *nb);