diff mbox series

[05/13] kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy()

Message ID 20230117134942.101112-6-yi.l.liu@intel.com (mailing list archive)
State New, archived
Headers show
Series Add vfio_device cdev for iommufd support | expand

Commit Message

Yi Liu Jan. 17, 2023, 1:49 p.m. UTC
This is to avoid a circular refcount problem between the kvm struct and
the device file. KVM modules holds device/group file reference when the
device/group is added and releases it per removal or the last kvm reference
is released. This reference model is ok for the group since there is no
kvm reference in the group paths.

But it is a problem for device file since the vfio devices may get kvm
reference in the device open path and put it in the device file release.
e.g. Intel kvmgt. This would result in a circular issue since the kvm
side won't put the device file reference if kvm reference is not 0, while
the vfio device side needs to put kvm reference in the release callback.

To solve this problem for device file, let vfio provide release() which
would be called once kvm file is closed, it won't depend on the last kvm
reference. Hence avoid circular refcount problem.

Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 virt/kvm/vfio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Tian, Kevin Jan. 18, 2023, 8:56 a.m. UTC | #1
> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, January 17, 2023 9:50 PM
> 
> @@ -364,7 +364,7 @@ static int kvm_vfio_create(struct kvm_device *dev,
> u32 type);
>  static struct kvm_device_ops kvm_vfio_ops = {
>  	.name = "kvm-vfio",
>  	.create = kvm_vfio_create,
> -	.destroy = kvm_vfio_destroy,
> +	.release = kvm_vfio_destroy,

Also rename to kvm_vfio_release.
Eric Auger Jan. 19, 2023, 9:12 a.m. UTC | #2
Hi Yi,

On 1/17/23 14:49, Yi Liu wrote:
> This is to avoid a circular refcount problem between the kvm struct and
> the device file. KVM modules holds device/group file reference when the
> device/group is added and releases it per removal or the last kvm reference
> is released. This reference model is ok for the group since there is no
> kvm reference in the group paths.
>
> But it is a problem for device file since the vfio devices may get kvm
> reference in the device open path and put it in the device file release.
> e.g. Intel kvmgt. This would result in a circular issue since the kvm
> side won't put the device file reference if kvm reference is not 0, while
> the vfio device side needs to put kvm reference in the release callback.
>
> To solve this problem for device file, let vfio provide release() which
> would be called once kvm file is closed, it won't depend on the last kvm
> reference. Hence avoid circular refcount problem.
>
> Suggested-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  virt/kvm/vfio.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> index 0f54b9d308d7..525efe37ab6d 100644
> --- a/virt/kvm/vfio.c
> +++ b/virt/kvm/vfio.c
> @@ -364,7 +364,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type);
>  static struct kvm_device_ops kvm_vfio_ops = {
>  	.name = "kvm-vfio",
>  	.create = kvm_vfio_create,
> -	.destroy = kvm_vfio_destroy,
Is it safe to simply remove the destroy cb as it is called from
kvm_destroy_vm/kvm_destroy_devices?

Thanks

Eric
> +	.release = kvm_vfio_destroy,
>  	.set_attr = kvm_vfio_set_attr,
>  	.has_attr = kvm_vfio_has_attr,
>  };
Tian, Kevin Jan. 19, 2023, 9:30 a.m. UTC | #3
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Thursday, January 19, 2023 5:13 PM
> 
> Hi Yi,
> 
> On 1/17/23 14:49, Yi Liu wrote:
> > This is to avoid a circular refcount problem between the kvm struct and
> > the device file. KVM modules holds device/group file reference when the
> > device/group is added and releases it per removal or the last kvm reference
> > is released. This reference model is ok for the group since there is no
> > kvm reference in the group paths.
> >
> > But it is a problem for device file since the vfio devices may get kvm
> > reference in the device open path and put it in the device file release.
> > e.g. Intel kvmgt. This would result in a circular issue since the kvm
> > side won't put the device file reference if kvm reference is not 0, while
> > the vfio device side needs to put kvm reference in the release callback.
> >
> > To solve this problem for device file, let vfio provide release() which
> > would be called once kvm file is closed, it won't depend on the last kvm
> > reference. Hence avoid circular refcount problem.
> >
> > Suggested-by: Kevin Tian <kevin.tian@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  virt/kvm/vfio.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> > index 0f54b9d308d7..525efe37ab6d 100644
> > --- a/virt/kvm/vfio.c
> > +++ b/virt/kvm/vfio.c
> > @@ -364,7 +364,7 @@ static int kvm_vfio_create(struct kvm_device *dev,
> u32 type);
> >  static struct kvm_device_ops kvm_vfio_ops = {
> >  	.name = "kvm-vfio",
> >  	.create = kvm_vfio_create,
> > -	.destroy = kvm_vfio_destroy,
> Is it safe to simply remove the destroy cb as it is called from
> kvm_destroy_vm/kvm_destroy_devices?
> 

According to the definition .release is considered as an alternative
method to free the device:

	/*
	 * Destroy is responsible for freeing dev.
	 *
	 * Destroy may be called before or after destructors are called
	 * on emulated I/O regions, depending on whether a reference is
	 * held by a vcpu or other kvm component that gets destroyed
	 * after the emulated I/O.
	 */
	void (*destroy)(struct kvm_device *dev);

	/*
	 * Release is an alternative method to free the device. It is
	 * called when the device file descriptor is closed. Once
	 * release is called, the destroy method will not be called
	 * anymore as the device is removed from the device list of
	 * the VM. kvm->lock is held.
	 */
	void (*release)(struct kvm_device *dev);

Did you see any specific problem of moving this stuff to release?
Jason Gunthorpe Jan. 19, 2023, 7:07 p.m. UTC | #4
On Tue, Jan 17, 2023 at 05:49:34AM -0800, Yi Liu wrote:
> This is to avoid a circular refcount problem between the kvm struct and
> the device file. KVM modules holds device/group file reference when the
> device/group is added and releases it per removal or the last kvm reference
> is released. This reference model is ok for the group since there is no
> kvm reference in the group paths.
> 
> But it is a problem for device file since the vfio devices may get kvm
> reference in the device open path and put it in the device file release.
> e.g. Intel kvmgt. This would result in a circular issue since the kvm
> side won't put the device file reference if kvm reference is not 0, while
> the vfio device side needs to put kvm reference in the release callback.
> 
> To solve this problem for device file, let vfio provide release() which
> would be called once kvm file is closed, it won't depend on the last kvm
> reference. Hence avoid circular refcount problem.
> 
> Suggested-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  virt/kvm/vfio.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

From Alex's remarks please revise the commit message and add a Fixes
line of some kind that this solves the deadlock Matthew was working
on, and send it stand alone right away

Jason
Alex Williamson Jan. 19, 2023, 8:04 p.m. UTC | #5
On Thu, 19 Jan 2023 15:07:01 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Tue, Jan 17, 2023 at 05:49:34AM -0800, Yi Liu wrote:
> > This is to avoid a circular refcount problem between the kvm struct and
> > the device file. KVM modules holds device/group file reference when the
> > device/group is added and releases it per removal or the last kvm reference
> > is released. This reference model is ok for the group since there is no
> > kvm reference in the group paths.
> > 
> > But it is a problem for device file since the vfio devices may get kvm
> > reference in the device open path and put it in the device file release.
> > e.g. Intel kvmgt. This would result in a circular issue since the kvm
> > side won't put the device file reference if kvm reference is not 0, while
> > the vfio device side needs to put kvm reference in the release callback.
> > 
> > To solve this problem for device file, let vfio provide release() which
> > would be called once kvm file is closed, it won't depend on the last kvm
> > reference. Hence avoid circular refcount problem.
> > 
> > Suggested-by: Kevin Tian <kevin.tian@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  virt/kvm/vfio.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)  
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> From Alex's remarks please revise the commit message and add a Fixes
> line of some kind that this solves the deadlock Matthew was working
> on, and send it stand alone right away

Also revise the commit log since we'll be taking a reference in the
group model as well.  The function and comments should also be updated
s/destroy/release/.  Thanks,

Alex
Yi Liu Jan. 20, 2023, 3:52 a.m. UTC | #6
> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Thursday, January 19, 2023 5:30 PM
> 
> > From: Eric Auger <eric.auger@redhat.com>
> > Sent: Thursday, January 19, 2023 5:13 PM
> >
> > Hi Yi,
> >
> > On 1/17/23 14:49, Yi Liu wrote:
> > > This is to avoid a circular refcount problem between the kvm struct and
> > > the device file. KVM modules holds device/group file reference when
> the
> > > device/group is added and releases it per removal or the last kvm
> reference
> > > is released. This reference model is ok for the group since there is no
> > > kvm reference in the group paths.
> > >
> > > But it is a problem for device file since the vfio devices may get kvm
> > > reference in the device open path and put it in the device file release.
> > > e.g. Intel kvmgt. This would result in a circular issue since the kvm
> > > side won't put the device file reference if kvm reference is not 0, while
> > > the vfio device side needs to put kvm reference in the release callback.
> > >
> > > To solve this problem for device file, let vfio provide release() which
> > > would be called once kvm file is closed, it won't depend on the last kvm
> > > reference. Hence avoid circular refcount problem.
> > >
> > > Suggested-by: Kevin Tian <kevin.tian@intel.com>
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > >  virt/kvm/vfio.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> > > index 0f54b9d308d7..525efe37ab6d 100644
> > > --- a/virt/kvm/vfio.c
> > > +++ b/virt/kvm/vfio.c
> > > @@ -364,7 +364,7 @@ static int kvm_vfio_create(struct kvm_device
> *dev,
> > u32 type);
> > >  static struct kvm_device_ops kvm_vfio_ops = {
> > >  	.name = "kvm-vfio",
> > >  	.create = kvm_vfio_create,
> > > -	.destroy = kvm_vfio_destroy,
> > Is it safe to simply remove the destroy cb as it is called from
> > kvm_destroy_vm/kvm_destroy_devices?
> >

Perhaps better to keep it. kvm_vfio_device is only one kind of kvm_device_type
For kvm_vfio_device, it is now considered to better provide a release cb.
While other kvm_device may better to have destroy cb.

> 
> According to the definition .release is considered as an alternative
> method to free the device:
> 
> 	/*
> 	 * Destroy is responsible for freeing dev.
> 	 *
> 	 * Destroy may be called before or after destructors are called
> 	 * on emulated I/O regions, depending on whether a reference is
> 	 * held by a vcpu or other kvm component that gets destroyed
> 	 * after the emulated I/O.
> 	 */
> 	void (*destroy)(struct kvm_device *dev);
> 
> 	/*
> 	 * Release is an alternative method to free the device. It is
> 	 * called when the device file descriptor is closed. Once
> 	 * release is called, the destroy method will not be called
> 	 * anymore as the device is removed from the device list of
> 	 * the VM. kvm->lock is held.
> 	 */
> 	void (*release)(struct kvm_device *dev);
> 
> Did you see any specific problem of moving this stuff to release?

It should only affect kvm_vfio_device itself. 
Yi Liu Jan. 20, 2023, 1:03 p.m. UTC | #7
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, January 20, 2023 3:07 AM
> 
> On Tue, Jan 17, 2023 at 05:49:34AM -0800, Yi Liu wrote:
> > This is to avoid a circular refcount problem between the kvm struct and
> > the device file. KVM modules holds device/group file reference when the
> > device/group is added and releases it per removal or the last kvm
> reference
> > is released. This reference model is ok for the group since there is no
> > kvm reference in the group paths.
> >
> > But it is a problem for device file since the vfio devices may get kvm
> > reference in the device open path and put it in the device file release.
> > e.g. Intel kvmgt. This would result in a circular issue since the kvm
> > side won't put the device file reference if kvm reference is not 0, while
> > the vfio device side needs to put kvm reference in the release callback.
> >
> > To solve this problem for device file, let vfio provide release() which
> > would be called once kvm file is closed, it won't depend on the last kvm
> > reference. Hence avoid circular refcount problem.
> >
> > Suggested-by: Kevin Tian <kevin.tian@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  virt/kvm/vfio.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> From Alex's remarks please revise the commit message and add a Fixes
> line of some kind that this solves the deadlock Matthew was working
> on, and send it stand alone right away

Sure. 
Yi Liu Jan. 20, 2023, 2 p.m. UTC | #8
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, January 20, 2023 3:07 AM
> 
> On Tue, Jan 17, 2023 at 05:49:34AM -0800, Yi Liu wrote:
> > This is to avoid a circular refcount problem between the kvm struct and
> > the device file. KVM modules holds device/group file reference when the
> > device/group is added and releases it per removal or the last kvm
> reference
> > is released. This reference model is ok for the group since there is no
> > kvm reference in the group paths.
> >
> > But it is a problem for device file since the vfio devices may get kvm
> > reference in the device open path and put it in the device file release.
> > e.g. Intel kvmgt. This would result in a circular issue since the kvm
> > side won't put the device file reference if kvm reference is not 0, while
> > the vfio device side needs to put kvm reference in the release callback.
> >
> > To solve this problem for device file, let vfio provide release() which
> > would be called once kvm file is closed, it won't depend on the last kvm
> > reference. Hence avoid circular refcount problem.
> >
> > Suggested-by: Kevin Tian <kevin.tian@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  virt/kvm/vfio.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> From Alex's remarks please revise the commit message and add a Fixes
> line of some kind that this solves the deadlock Matthew was working
> on, and send it stand alone right away

Hi Kevin, Jason,

I got a minor question. Let me check your opinions.

So after this change. Say we have thread A, which is the kvm-vfio device
release. It will hold the kvm_lock and delete the kvm-vfio device from
the kvm-device list. It will also call into vfio to set KVM==NULL. So it will
try to hold group_lock. The locking order is kvm_lock -> group_lock.

Say in the same time, we have thread B closes device, it will hold
group_lock first and then calls kvm_put_kvm() which is the last
reference, then it would loop the kvm-device list. Currently, it is
not holding kvm_lock. But it also manipulating the kvm-device list,
should it hold kvm_lock? If yes, then the locking order is
group_lock -> kvm_lock. Then we will have A-B B-A lock issue.

Regards,
Yi Liu
Jason Gunthorpe Jan. 20, 2023, 2:33 p.m. UTC | #9
On Fri, Jan 20, 2023 at 02:00:26PM +0000, Liu, Yi L wrote:
> Say in the same time, we have thread B closes device, it will hold
> group_lock first and then calls kvm_put_kvm() which is the last
> reference, then it would loop the kvm-device list. Currently, it is
> not holding kvm_lock. But it also manipulating the kvm-device list,
> should it hold kvm_lock? 

No. When using refcounts if the refcount is 0 it guarantees there are
no other threads that can possibly touch this memory, so any locks
internal to the memory are not required.

Jason
Yi Liu Jan. 20, 2023, 3:09 p.m. UTC | #10
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, January 20, 2023 10:34 PM
> 
> On Fri, Jan 20, 2023 at 02:00:26PM +0000, Liu, Yi L wrote:
> > Say in the same time, we have thread B closes device, it will hold
> > group_lock first and then calls kvm_put_kvm() which is the last
> > reference, then it would loop the kvm-device list. Currently, it is
> > not holding kvm_lock. But it also manipulating the kvm-device list,
> > should it hold kvm_lock?
> 
> No. When using refcounts if the refcount is 0 it guarantees there are
> no other threads that can possibly touch this memory, so any locks
> internal to the memory are not required.

Ok. The patch has been sent out standalone.

https://lore.kernel.org/kvm/20230114000351.115444-1-mjrosato@linux.ibm.com/T/#u

Regards,
Yi Liu
Yi Liu Jan. 20, 2023, 3:11 p.m. UTC | #11
> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Friday, January 20, 2023 11:10 PM
> 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Friday, January 20, 2023 10:34 PM
> >
> > On Fri, Jan 20, 2023 at 02:00:26PM +0000, Liu, Yi L wrote:
> > > Say in the same time, we have thread B closes device, it will hold
> > > group_lock first and then calls kvm_put_kvm() which is the last
> > > reference, then it would loop the kvm-device list. Currently, it is
> > > not holding kvm_lock. But it also manipulating the kvm-device list,
> > > should it hold kvm_lock?
> >
> > No. When using refcounts if the refcount is 0 it guarantees there are
> > no other threads that can possibly touch this memory, so any locks
> > internal to the memory are not required.
> 
> Ok. The patch has been sent out standalone.
> 
> https://lore.kernel.org/kvm/20230114000351.115444-1-
> mjrosato@linux.ibm.com/T/#u

Wrong link. Below is the correct one. 
diff mbox series

Patch

diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 0f54b9d308d7..525efe37ab6d 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -364,7 +364,7 @@  static int kvm_vfio_create(struct kvm_device *dev, u32 type);
 static struct kvm_device_ops kvm_vfio_ops = {
 	.name = "kvm-vfio",
 	.create = kvm_vfio_create,
-	.destroy = kvm_vfio_destroy,
+	.release = kvm_vfio_destroy,
 	.set_attr = kvm_vfio_set_attr,
 	.has_attr = kvm_vfio_has_attr,
 };