diff mbox series

[v12,07/24] vfio: Block device access via device fd until device is opened

Message ID 20230602121653.80017-8-yi.l.liu@intel.com (mailing list archive)
State New, archived
Headers show
Series Add vfio_device cdev for iommufd support | expand

Commit Message

Liu, Yi L June 2, 2023, 12:16 p.m. UTC
Allow the vfio_device file to be in a state where the device FD is
opened but the device cannot be used by userspace (i.e. its .open_device()
hasn't been called). This inbetween state is not used when the device
FD is spawned from the group FD, however when we create the device FD
directly by opening a cdev it will be opened in the blocked state.

The reason for the inbetween state is that userspace only gets a FD but
doesn't gain access permission until binding the FD to an iommufd. So in
the blocked state, only the bind operation is allowed. Completing bind
will allow user to further access the device.

This is implemented by adding a flag in struct vfio_device_file to mark
the blocked state and using a simple smp_load_acquire() to obtain the
flag value and serialize all the device setup with the thread accessing
this device.

Following this lockless scheme, it can safely handle the device FD
unbound->bound but it cannot handle bound->unbound. To allow this we'd
need to add a lock on all the vfio ioctls which seems costly. So once
device FD is bound, it remains bound until the FD is closed.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 11 ++++++++++-
 drivers/vfio/vfio.h      |  1 +
 drivers/vfio/vfio_main.c | 16 ++++++++++++++++
 3 files changed, 27 insertions(+), 1 deletion(-)

Comments

Alex Williamson June 12, 2023, 9:52 p.m. UTC | #1
On Fri,  2 Jun 2023 05:16:36 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> Allow the vfio_device file to be in a state where the device FD is
> opened but the device cannot be used by userspace (i.e. its .open_device()
> hasn't been called). This inbetween state is not used when the device
> FD is spawned from the group FD, however when we create the device FD
> directly by opening a cdev it will be opened in the blocked state.
> 
> The reason for the inbetween state is that userspace only gets a FD but
> doesn't gain access permission until binding the FD to an iommufd. So in
> the blocked state, only the bind operation is allowed. Completing bind
> will allow user to further access the device.
> 
> This is implemented by adding a flag in struct vfio_device_file to mark
> the blocked state and using a simple smp_load_acquire() to obtain the
> flag value and serialize all the device setup with the thread accessing
> this device.
> 
> Following this lockless scheme, it can safely handle the device FD
> unbound->bound but it cannot handle bound->unbound. To allow this we'd
> need to add a lock on all the vfio ioctls which seems costly. So once
> device FD is bound, it remains bound until the FD is closed.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/group.c     | 11 ++++++++++-
>  drivers/vfio/vfio.h      |  1 +
>  drivers/vfio/vfio_main.c | 16 ++++++++++++++++
>  3 files changed, 27 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index caf53716ddb2..088dd34c8931 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -194,9 +194,18 @@ static int vfio_df_group_open(struct vfio_device_file *df)
>  	df->iommufd = device->group->iommufd;
>  
>  	ret = vfio_df_open(df);
> -	if (ret)
> +	if (ret) {
>  		df->iommufd = NULL;
> +		goto out_put_kvm;
> +	}
> +
> +	/*
> +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> +	 * read/write/mmap and vfio_file_has_device_access()
> +	 */
> +	smp_store_release(&df->access_granted, true);
>  
> +out_put_kvm:
>  	if (device->open_count == 0)
>  		vfio_device_put_kvm(device);
>  
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index f9eb52eb9ed7..fdf2fc73f880 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -18,6 +18,7 @@ struct vfio_container;
>  
>  struct vfio_device_file {
>  	struct vfio_device *device;
> +	bool access_granted;

Should we make this a more strongly defined data type and later move
devid (u32) here to partially fill the hole created?

I think this is being placed towards the front of the data structure
for cache line locality given this is a hot path for file operations.
But bool types have an implementation dependent size, making them
difficult to pack.  Also there will be a tendency to want to make this
a bit field, which is probably not compatible with the smp lockless
operations being used here.  We might get in front of these issues if
we just define it as a u8 now.  Thanks,

Alex

>  	spinlock_t kvm_ref_lock; /* protect kvm field */
>  	struct kvm *kvm;
>  	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index a3c5817fc545..4c8b7713dc3d 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -1129,6 +1129,10 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
>  	struct vfio_device *device = df->device;
>  	int ret;
>  
> +	/* Paired with smp_store_release() following vfio_df_open() */
> +	if (!smp_load_acquire(&df->access_granted))
> +		return -EINVAL;
> +
>  	ret = vfio_device_pm_runtime_get(device);
>  	if (ret)
>  		return ret;
> @@ -1156,6 +1160,10 @@ static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
>  
> +	/* Paired with smp_store_release() following vfio_df_open() */
> +	if (!smp_load_acquire(&df->access_granted))
> +		return -EINVAL;
> +
>  	if (unlikely(!device->ops->read))
>  		return -EINVAL;
>  
> @@ -1169,6 +1177,10 @@ static ssize_t vfio_device_fops_write(struct file *filep,
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
>  
> +	/* Paired with smp_store_release() following vfio_df_open() */
> +	if (!smp_load_acquire(&df->access_granted))
> +		return -EINVAL;
> +
>  	if (unlikely(!device->ops->write))
>  		return -EINVAL;
>  
> @@ -1180,6 +1192,10 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
>  
> +	/* Paired with smp_store_release() following vfio_df_open() */
> +	if (!smp_load_acquire(&df->access_granted))
> +		return -EINVAL;
> +
>  	if (unlikely(!device->ops->mmap))
>  		return -EINVAL;
>
Liu, Yi L June 13, 2023, 5:46 a.m. UTC | #2
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 5:52 AM
> 
> On Fri,  2 Jun 2023 05:16:36 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > Allow the vfio_device file to be in a state where the device FD is
> > opened but the device cannot be used by userspace (i.e. its .open_device()
> > hasn't been called). This inbetween state is not used when the device
> > FD is spawned from the group FD, however when we create the device FD
> > directly by opening a cdev it will be opened in the blocked state.
> >
> > The reason for the inbetween state is that userspace only gets a FD but
> > doesn't gain access permission until binding the FD to an iommufd. So in
> > the blocked state, only the bind operation is allowed. Completing bind
> > will allow user to further access the device.
> >
> > This is implemented by adding a flag in struct vfio_device_file to mark
> > the blocked state and using a simple smp_load_acquire() to obtain the
> > flag value and serialize all the device setup with the thread accessing
> > this device.
> >
> > Following this lockless scheme, it can safely handle the device FD
> > unbound->bound but it cannot handle bound->unbound. To allow this we'd
> > need to add a lock on all the vfio ioctls which seems costly. So once
> > device FD is bound, it remains bound until the FD is closed.
> >
> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > Reviewed-by: Eric Auger <eric.auger@redhat.com>
> > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> > Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/group.c     | 11 ++++++++++-
> >  drivers/vfio/vfio.h      |  1 +
> >  drivers/vfio/vfio_main.c | 16 ++++++++++++++++
> >  3 files changed, 27 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > index caf53716ddb2..088dd34c8931 100644
> > --- a/drivers/vfio/group.c
> > +++ b/drivers/vfio/group.c
> > @@ -194,9 +194,18 @@ static int vfio_df_group_open(struct vfio_device_file *df)
> >  	df->iommufd = device->group->iommufd;
> >
> >  	ret = vfio_df_open(df);
> > -	if (ret)
> > +	if (ret) {
> >  		df->iommufd = NULL;
> > +		goto out_put_kvm;
> > +	}
> > +
> > +	/*
> > +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> > +	 * read/write/mmap and vfio_file_has_device_access()
> > +	 */
> > +	smp_store_release(&df->access_granted, true);
> >
> > +out_put_kvm:
> >  	if (device->open_count == 0)
> >  		vfio_device_put_kvm(device);
> >
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index f9eb52eb9ed7..fdf2fc73f880 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -18,6 +18,7 @@ struct vfio_container;
> >
> >  struct vfio_device_file {
> >  	struct vfio_device *device;
> > +	bool access_granted;
> 
> Should we make this a more strongly defined data type and later move
> devid (u32) here to partially fill the hole created?

Before your question, let me describe how I place the fields
of this structure to see if it is common practice. The first two
fields are static, so they are in the beginning. The access_granted
is lockless and other fields are protected by locks. So I tried to
put the lock and the fields it protects closely. So this is why I put
devid behind iommufd as both are protected by the same lock.

struct vfio_device_file {
        struct vfio_device *device;
        struct vfio_group *group;

        bool access_granted;
        spinlock_t kvm_ref_lock; /* protect kvm field */
        struct kvm *kvm;
        struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
        u32 devid; /* only valid when iommufd is valid */
};

> 
> I think this is being placed towards the front of the data structure
> for cache line locality given this is a hot path for file operations.
> But bool types have an implementation dependent size, making them
> difficult to pack.  Also there will be a tendency to want to make this
> a bit field, which is probably not compatible with the smp lockless
> operations being used here.  We might get in front of these issues if
> we just define it as a u8 now.  Thanks,

Not quite get why bit field is going to be incompatible with smp
lockless operations. Could you elaborate a bit? And should I define
the access_granted as u8 or "u8:1"?

Regards,
Yi Liu

> 
> >  	spinlock_t kvm_ref_lock; /* protect kvm field */
> >  	struct kvm *kvm;
> >  	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index a3c5817fc545..4c8b7713dc3d 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -1129,6 +1129,10 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
> >  	struct vfio_device *device = df->device;
> >  	int ret;
> >
> > +	/* Paired with smp_store_release() following vfio_df_open() */
> > +	if (!smp_load_acquire(&df->access_granted))
> > +		return -EINVAL;
> > +
> >  	ret = vfio_device_pm_runtime_get(device);
> >  	if (ret)
> >  		return ret;
> > @@ -1156,6 +1160,10 @@ static ssize_t vfio_device_fops_read(struct file *filep, char
> __user *buf,
> >  	struct vfio_device_file *df = filep->private_data;
> >  	struct vfio_device *device = df->device;
> >
> > +	/* Paired with smp_store_release() following vfio_df_open() */
> > +	if (!smp_load_acquire(&df->access_granted))
> > +		return -EINVAL;
> > +
> >  	if (unlikely(!device->ops->read))
> >  		return -EINVAL;
> >
> > @@ -1169,6 +1177,10 @@ static ssize_t vfio_device_fops_write(struct file *filep,
> >  	struct vfio_device_file *df = filep->private_data;
> >  	struct vfio_device *device = df->device;
> >
> > +	/* Paired with smp_store_release() following vfio_df_open() */
> > +	if (!smp_load_acquire(&df->access_granted))
> > +		return -EINVAL;
> > +
> >  	if (unlikely(!device->ops->write))
> >  		return -EINVAL;
> >
> > @@ -1180,6 +1192,10 @@ static int vfio_device_fops_mmap(struct file *filep, struct
> vm_area_struct *vma)
> >  	struct vfio_device_file *df = filep->private_data;
> >  	struct vfio_device *device = df->device;
> >
> > +	/* Paired with smp_store_release() following vfio_df_open() */
> > +	if (!smp_load_acquire(&df->access_granted))
> > +		return -EINVAL;
> > +
> >  	if (unlikely(!device->ops->mmap))
> >  		return -EINVAL;
> >
Alex Williamson June 13, 2023, 2:16 p.m. UTC | #3
On Tue, 13 Jun 2023 05:46:32 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, June 13, 2023 5:52 AM
> > 
> > On Fri,  2 Jun 2023 05:16:36 -0700
> > Yi Liu <yi.l.liu@intel.com> wrote:
> >   
> > > Allow the vfio_device file to be in a state where the device FD is
> > > opened but the device cannot be used by userspace (i.e. its .open_device()
> > > hasn't been called). This inbetween state is not used when the device
> > > FD is spawned from the group FD, however when we create the device FD
> > > directly by opening a cdev it will be opened in the blocked state.
> > >
> > > The reason for the inbetween state is that userspace only gets a FD but
> > > doesn't gain access permission until binding the FD to an iommufd. So in
> > > the blocked state, only the bind operation is allowed. Completing bind
> > > will allow user to further access the device.
> > >
> > > This is implemented by adding a flag in struct vfio_device_file to mark
> > > the blocked state and using a simple smp_load_acquire() to obtain the
> > > flag value and serialize all the device setup with the thread accessing
> > > this device.
> > >
> > > Following this lockless scheme, it can safely handle the device FD
> > > unbound->bound but it cannot handle bound->unbound. To allow this we'd
> > > need to add a lock on all the vfio ioctls which seems costly. So once
> > > device FD is bound, it remains bound until the FD is closed.
> > >
> > > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > > Reviewed-by: Eric Auger <eric.auger@redhat.com>
> > > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > > Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> > > Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> > > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > >  drivers/vfio/group.c     | 11 ++++++++++-
> > >  drivers/vfio/vfio.h      |  1 +
> > >  drivers/vfio/vfio_main.c | 16 ++++++++++++++++
> > >  3 files changed, 27 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > > index caf53716ddb2..088dd34c8931 100644
> > > --- a/drivers/vfio/group.c
> > > +++ b/drivers/vfio/group.c
> > > @@ -194,9 +194,18 @@ static int vfio_df_group_open(struct vfio_device_file *df)
> > >  	df->iommufd = device->group->iommufd;
> > >
> > >  	ret = vfio_df_open(df);
> > > -	if (ret)
> > > +	if (ret) {
> > >  		df->iommufd = NULL;
> > > +		goto out_put_kvm;
> > > +	}
> > > +
> > > +	/*
> > > +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> > > +	 * read/write/mmap and vfio_file_has_device_access()
> > > +	 */
> > > +	smp_store_release(&df->access_granted, true);
> > >
> > > +out_put_kvm:
> > >  	if (device->open_count == 0)
> > >  		vfio_device_put_kvm(device);
> > >
> > > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > > index f9eb52eb9ed7..fdf2fc73f880 100644
> > > --- a/drivers/vfio/vfio.h
> > > +++ b/drivers/vfio/vfio.h
> > > @@ -18,6 +18,7 @@ struct vfio_container;
> > >
> > >  struct vfio_device_file {
> > >  	struct vfio_device *device;
> > > +	bool access_granted;  
> > 
> > Should we make this a more strongly defined data type and later move
> > devid (u32) here to partially fill the hole created?  
> 
> Before your question, let me describe how I place the fields
> of this structure to see if it is common practice. The first two
> fields are static, so they are in the beginning. The access_granted
> is lockless and other fields are protected by locks. So I tried to
> put the lock and the fields it protects closely. So this is why I put
> devid behind iommufd as both are protected by the same lock.

I think the primary considerations are locality and compactness.  Hot
paths data should be within the first cache line of the structure,
related data should share a cache line, and we should use the space
efficiently.  What you describe seems largely an aesthetic concern,
which was not evident to me by the segmentation alone.
 
> struct vfio_device_file {
>         struct vfio_device *device;
>         struct vfio_group *group;
> 
>         bool access_granted;
>         spinlock_t kvm_ref_lock; /* protect kvm field */
>         struct kvm *kvm;
>         struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
>         u32 devid; /* only valid when iommufd is valid */
> };
> 
> > 
> > I think this is being placed towards the front of the data structure
> > for cache line locality given this is a hot path for file operations.
> > But bool types have an implementation dependent size, making them
> > difficult to pack.  Also there will be a tendency to want to make this
> > a bit field, which is probably not compatible with the smp lockless
> > operations being used here.  We might get in front of these issues if
> > we just define it as a u8 now.  Thanks,  
> 
> Not quite get why bit field is going to be incompatible with smp
> lockless operations. Could you elaborate a bit? And should I define
> the access_granted as u8 or "u8:1"?

Perhaps FUD on my part, but load-acquire type operations have specific
semantics and it's not clear to me that they interest with compiler
generated bit operations.  Thanks,

Alex

> > >  	spinlock_t kvm_ref_lock; /* protect kvm field */
> > >  	struct kvm *kvm;
> > >  	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> > > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > > index a3c5817fc545..4c8b7713dc3d 100644
> > > --- a/drivers/vfio/vfio_main.c
> > > +++ b/drivers/vfio/vfio_main.c
> > > @@ -1129,6 +1129,10 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
> > >  	struct vfio_device *device = df->device;
> > >  	int ret;
> > >
> > > +	/* Paired with smp_store_release() following vfio_df_open() */
> > > +	if (!smp_load_acquire(&df->access_granted))
> > > +		return -EINVAL;
> > > +
> > >  	ret = vfio_device_pm_runtime_get(device);
> > >  	if (ret)
> > >  		return ret;
> > > @@ -1156,6 +1160,10 @@ static ssize_t vfio_device_fops_read(struct file *filep, char  
> > __user *buf,  
> > >  	struct vfio_device_file *df = filep->private_data;
> > >  	struct vfio_device *device = df->device;
> > >
> > > +	/* Paired with smp_store_release() following vfio_df_open() */
> > > +	if (!smp_load_acquire(&df->access_granted))
> > > +		return -EINVAL;
> > > +
> > >  	if (unlikely(!device->ops->read))
> > >  		return -EINVAL;
> > >
> > > @@ -1169,6 +1177,10 @@ static ssize_t vfio_device_fops_write(struct file *filep,
> > >  	struct vfio_device_file *df = filep->private_data;
> > >  	struct vfio_device *device = df->device;
> > >
> > > +	/* Paired with smp_store_release() following vfio_df_open() */
> > > +	if (!smp_load_acquire(&df->access_granted))
> > > +		return -EINVAL;
> > > +
> > >  	if (unlikely(!device->ops->write))
> > >  		return -EINVAL;
> > >
> > > @@ -1180,6 +1192,10 @@ static int vfio_device_fops_mmap(struct file *filep, struct  
> > vm_area_struct *vma)  
> > >  	struct vfio_device_file *df = filep->private_data;
> > >  	struct vfio_device *device = df->device;
> > >
> > > +	/* Paired with smp_store_release() following vfio_df_open() */
> > > +	if (!smp_load_acquire(&df->access_granted))
> > > +		return -EINVAL;
> > > +
> > >  	if (unlikely(!device->ops->mmap))
> > >  		return -EINVAL;
> > >  
>
Liu, Yi L June 13, 2023, 2:36 p.m. UTC | #4
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 10:17 PM
> 
> On Tue, 13 Jun 2023 05:46:32 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Tuesday, June 13, 2023 5:52 AM
> > >
> > > On Fri,  2 Jun 2023 05:16:36 -0700
> > > Yi Liu <yi.l.liu@intel.com> wrote:
> > >
> > > > Allow the vfio_device file to be in a state where the device FD is
> > > > opened but the device cannot be used by userspace (i.e. its .open_device()
> > > > hasn't been called). This inbetween state is not used when the device
> > > > FD is spawned from the group FD, however when we create the device FD
> > > > directly by opening a cdev it will be opened in the blocked state.
> > > >
> > > > The reason for the inbetween state is that userspace only gets a FD but
> > > > doesn't gain access permission until binding the FD to an iommufd. So in
> > > > the blocked state, only the bind operation is allowed. Completing bind
> > > > will allow user to further access the device.
> > > >
> > > > This is implemented by adding a flag in struct vfio_device_file to mark
> > > > the blocked state and using a simple smp_load_acquire() to obtain the
> > > > flag value and serialize all the device setup with the thread accessing
> > > > this device.
> > > >
> > > > Following this lockless scheme, it can safely handle the device FD
> > > > unbound->bound but it cannot handle bound->unbound. To allow this we'd
> > > > need to add a lock on all the vfio ioctls which seems costly. So once
> > > > device FD is bound, it remains bound until the FD is closed.
> > > >
> > > > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > Reviewed-by: Eric Auger <eric.auger@redhat.com>
> > > > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > > > Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> > > > Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> > > > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > > > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > > ---
> > > >  drivers/vfio/group.c     | 11 ++++++++++-
> > > >  drivers/vfio/vfio.h      |  1 +
> > > >  drivers/vfio/vfio_main.c | 16 ++++++++++++++++
> > > >  3 files changed, 27 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > > > index caf53716ddb2..088dd34c8931 100644
> > > > --- a/drivers/vfio/group.c
> > > > +++ b/drivers/vfio/group.c
> > > > @@ -194,9 +194,18 @@ static int vfio_df_group_open(struct vfio_device_file *df)
> > > >  	df->iommufd = device->group->iommufd;
> > > >
> > > >  	ret = vfio_df_open(df);
> > > > -	if (ret)
> > > > +	if (ret) {
> > > >  		df->iommufd = NULL;
> > > > +		goto out_put_kvm;
> > > > +	}
> > > > +
> > > > +	/*
> > > > +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> > > > +	 * read/write/mmap and vfio_file_has_device_access()
> > > > +	 */
> > > > +	smp_store_release(&df->access_granted, true);
> > > >
> > > > +out_put_kvm:
> > > >  	if (device->open_count == 0)
> > > >  		vfio_device_put_kvm(device);
> > > >
> > > > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > > > index f9eb52eb9ed7..fdf2fc73f880 100644
> > > > --- a/drivers/vfio/vfio.h
> > > > +++ b/drivers/vfio/vfio.h
> > > > @@ -18,6 +18,7 @@ struct vfio_container;
> > > >
> > > >  struct vfio_device_file {
> > > >  	struct vfio_device *device;
> > > > +	bool access_granted;
> > >
> > > Should we make this a more strongly defined data type and later move
> > > devid (u32) here to partially fill the hole created?
> >
> > Before your question, let me describe how I place the fields
> > of this structure to see if it is common practice. The first two
> > fields are static, so they are in the beginning. The access_granted
> > is lockless and other fields are protected by locks. So I tried to
> > put the lock and the fields it protects closely. So this is why I put
> > devid behind iommufd as both are protected by the same lock.
> 
> I think the primary considerations are locality and compactness.  Hot
> paths data should be within the first cache line of the structure,
> related data should share a cache line, and we should use the space
> efficiently.  What you describe seems largely an aesthetic concern,
> which was not evident to me by the segmentation alone.

Sure.

> 
> > struct vfio_device_file {
> >         struct vfio_device *device;
> >         struct vfio_group *group;
> >
> >         bool access_granted;
> >         spinlock_t kvm_ref_lock; /* protect kvm field */
> >         struct kvm *kvm;
> >         struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> >         u32 devid; /* only valid when iommufd is valid */
> > };
> >
> > >
> > > I think this is being placed towards the front of the data structure
> > > for cache line locality given this is a hot path for file operations.
> > > But bool types have an implementation dependent size, making them
> > > difficult to pack.  Also there will be a tendency to want to make this
> > > a bit field, which is probably not compatible with the smp lockless
> > > operations being used here.  We might get in front of these issues if
> > > we just define it as a u8 now.  Thanks,
> >
> > Not quite get why bit field is going to be incompatible with smp
> > lockless operations. Could you elaborate a bit? And should I define
> > the access_granted as u8 or "u8:1"?
> 
> Perhaps FUD on my part, but load-acquire type operations have specific
> semantics and it's not clear to me that they interest with compiler
> generated bit operations.  Thanks,

I see. How about below? 

struct vfio_device_file {
        struct vfio_device *device;
        struct vfio_group *group;
        u8 access_granted;
        u32 devid; /* only valid when iommufd is valid */
        spinlock_t kvm_ref_lock; /* protect kvm field */
        struct kvm *kvm;
        struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
};

Regards,
Yi Liu

> Alex
> 
> > > >  	spinlock_t kvm_ref_lock; /* protect kvm field */
> > > >  	struct kvm *kvm;
> > > >  	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> > > > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > > > index a3c5817fc545..4c8b7713dc3d 100644
> > > > --- a/drivers/vfio/vfio_main.c
> > > > +++ b/drivers/vfio/vfio_main.c
> > > > @@ -1129,6 +1129,10 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
> > > >  	struct vfio_device *device = df->device;
> > > >  	int ret;
> > > >
> > > > +	/* Paired with smp_store_release() following vfio_df_open() */
> > > > +	if (!smp_load_acquire(&df->access_granted))
> > > > +		return -EINVAL;
> > > > +
> > > >  	ret = vfio_device_pm_runtime_get(device);
> > > >  	if (ret)
> > > >  		return ret;
> > > > @@ -1156,6 +1160,10 @@ static ssize_t vfio_device_fops_read(struct file *filep,
> char
> > > __user *buf,
> > > >  	struct vfio_device_file *df = filep->private_data;
> > > >  	struct vfio_device *device = df->device;
> > > >
> > > > +	/* Paired with smp_store_release() following vfio_df_open() */
> > > > +	if (!smp_load_acquire(&df->access_granted))
> > > > +		return -EINVAL;
> > > > +
> > > >  	if (unlikely(!device->ops->read))
> > > >  		return -EINVAL;
> > > >
> > > > @@ -1169,6 +1177,10 @@ static ssize_t vfio_device_fops_write(struct file *filep,
> > > >  	struct vfio_device_file *df = filep->private_data;
> > > >  	struct vfio_device *device = df->device;
> > > >
> > > > +	/* Paired with smp_store_release() following vfio_df_open() */
> > > > +	if (!smp_load_acquire(&df->access_granted))
> > > > +		return -EINVAL;
> > > > +
> > > >  	if (unlikely(!device->ops->write))
> > > >  		return -EINVAL;
> > > >
> > > > @@ -1180,6 +1192,10 @@ static int vfio_device_fops_mmap(struct file *filep,
> struct
> > > vm_area_struct *vma)
> > > >  	struct vfio_device_file *df = filep->private_data;
> > > >  	struct vfio_device *device = df->device;
> > > >
> > > > +	/* Paired with smp_store_release() following vfio_df_open() */
> > > > +	if (!smp_load_acquire(&df->access_granted))
> > > > +		return -EINVAL;
> > > > +
> > > >  	if (unlikely(!device->ops->mmap))
> > > >  		return -EINVAL;
> > > >
> >
Alex Williamson June 13, 2023, 2:42 p.m. UTC | #5
On Tue, 13 Jun 2023 14:36:14 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, June 13, 2023 10:17 PM
> > 
> > On Tue, 13 Jun 2023 05:46:32 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >   
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Tuesday, June 13, 2023 5:52 AM
> > > >
> > > > On Fri,  2 Jun 2023 05:16:36 -0700
> > > > Yi Liu <yi.l.liu@intel.com> wrote:
> > > >  
> > > > > Allow the vfio_device file to be in a state where the device FD is
> > > > > opened but the device cannot be used by userspace (i.e. its .open_device()
> > > > > hasn't been called). This inbetween state is not used when the device
> > > > > FD is spawned from the group FD, however when we create the device FD
> > > > > directly by opening a cdev it will be opened in the blocked state.
> > > > >
> > > > > The reason for the inbetween state is that userspace only gets a FD but
> > > > > doesn't gain access permission until binding the FD to an iommufd. So in
> > > > > the blocked state, only the bind operation is allowed. Completing bind
> > > > > will allow user to further access the device.
> > > > >
> > > > > This is implemented by adding a flag in struct vfio_device_file to mark
> > > > > the blocked state and using a simple smp_load_acquire() to obtain the
> > > > > flag value and serialize all the device setup with the thread accessing
> > > > > this device.
> > > > >
> > > > > Following this lockless scheme, it can safely handle the device FD
> > > > > unbound->bound but it cannot handle bound->unbound. To allow this we'd
> > > > > need to add a lock on all the vfio ioctls which seems costly. So once
> > > > > device FD is bound, it remains bound until the FD is closed.
> > > > >
> > > > > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > > > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > > Reviewed-by: Eric Auger <eric.auger@redhat.com>
> > > > > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > > > > Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> > > > > Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> > > > > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > > > > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > > > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > > > ---
> > > > >  drivers/vfio/group.c     | 11 ++++++++++-
> > > > >  drivers/vfio/vfio.h      |  1 +
> > > > >  drivers/vfio/vfio_main.c | 16 ++++++++++++++++
> > > > >  3 files changed, 27 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > > > > index caf53716ddb2..088dd34c8931 100644
> > > > > --- a/drivers/vfio/group.c
> > > > > +++ b/drivers/vfio/group.c
> > > > > @@ -194,9 +194,18 @@ static int vfio_df_group_open(struct vfio_device_file *df)
> > > > >  	df->iommufd = device->group->iommufd;
> > > > >
> > > > >  	ret = vfio_df_open(df);
> > > > > -	if (ret)
> > > > > +	if (ret) {
> > > > >  		df->iommufd = NULL;
> > > > > +		goto out_put_kvm;
> > > > > +	}
> > > > > +
> > > > > +	/*
> > > > > +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> > > > > +	 * read/write/mmap and vfio_file_has_device_access()
> > > > > +	 */
> > > > > +	smp_store_release(&df->access_granted, true);
> > > > >
> > > > > +out_put_kvm:
> > > > >  	if (device->open_count == 0)
> > > > >  		vfio_device_put_kvm(device);
> > > > >
> > > > > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > > > > index f9eb52eb9ed7..fdf2fc73f880 100644
> > > > > --- a/drivers/vfio/vfio.h
> > > > > +++ b/drivers/vfio/vfio.h
> > > > > @@ -18,6 +18,7 @@ struct vfio_container;
> > > > >
> > > > >  struct vfio_device_file {
> > > > >  	struct vfio_device *device;
> > > > > +	bool access_granted;  
> > > >
> > > > Should we make this a more strongly defined data type and later move
> > > > devid (u32) here to partially fill the hole created?  
> > >
> > > Before your question, let me describe how I place the fields
> > > of this structure to see if it is common practice. The first two
> > > fields are static, so they are in the beginning. The access_granted
> > > is lockless and other fields are protected by locks. So I tried to
> > > put the lock and the fields it protects closely. So this is why I put
> > > devid behind iommufd as both are protected by the same lock.  
> > 
> > I think the primary considerations are locality and compactness.  Hot
> > paths data should be within the first cache line of the structure,
> > related data should share a cache line, and we should use the space
> > efficiently.  What you describe seems largely an aesthetic concern,
> > which was not evident to me by the segmentation alone.  
> 
> Sure.
> 
> >   
> > > struct vfio_device_file {
> > >         struct vfio_device *device;
> > >         struct vfio_group *group;
> > >
> > >         bool access_granted;
> > >         spinlock_t kvm_ref_lock; /* protect kvm field */
> > >         struct kvm *kvm;
> > >         struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> > >         u32 devid; /* only valid when iommufd is valid */
> > > };
> > >  
> > > >
> > > > I think this is being placed towards the front of the data structure
> > > > for cache line locality given this is a hot path for file operations.
> > > > But bool types have an implementation dependent size, making them
> > > > difficult to pack.  Also there will be a tendency to want to make this
> > > > a bit field, which is probably not compatible with the smp lockless
> > > > operations being used here.  We might get in front of these issues if
> > > > we just define it as a u8 now.  Thanks,  
> > >
> > > Not quite get why bit field is going to be incompatible with smp
> > > lockless operations. Could you elaborate a bit? And should I define
> > > the access_granted as u8 or "u8:1"?  
> > 
> > Perhaps FUD on my part, but load-acquire type operations have specific
> > semantics and it's not clear to me that they interest with compiler
> > generated bit operations.  Thanks,  
> 
> I see. How about below? 
> 
> struct vfio_device_file {
>         struct vfio_device *device;
>         struct vfio_group *group;
>         u8 access_granted;
>         u32 devid; /* only valid when iommufd is valid */
>         spinlock_t kvm_ref_lock; /* protect kvm field */
>         struct kvm *kvm;
>         struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> };

Yep, that's essentially what I was suggesting.  Thanks,

Alex
Liu, Yi L June 13, 2023, 2:44 p.m. UTC | #6
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 10:42 PM
> 
> On Tue, 13 Jun 2023 14:36:14 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > > > > >
> > > > > > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > > > > > index f9eb52eb9ed7..fdf2fc73f880 100644
> > > > > > --- a/drivers/vfio/vfio.h
> > > > > > +++ b/drivers/vfio/vfio.h
> > > > > > @@ -18,6 +18,7 @@ struct vfio_container;
> > > > > >
> > > > > >  struct vfio_device_file {
> > > > > >  	struct vfio_device *device;
> > > > > > +	bool access_granted;
> > > > >
> > > > > Should we make this a more strongly defined data type and later move
> > > > > devid (u32) here to partially fill the hole created?
> > > >
> > > > Before your question, let me describe how I place the fields
> > > > of this structure to see if it is common practice. The first two
> > > > fields are static, so they are in the beginning. The access_granted
> > > > is lockless and other fields are protected by locks. So I tried to
> > > > put the lock and the fields it protects closely. So this is why I put
> > > > devid behind iommufd as both are protected by the same lock.
> > >
> > > I think the primary considerations are locality and compactness.  Hot
> > > paths data should be within the first cache line of the structure,
> > > related data should share a cache line, and we should use the space
> > > efficiently.  What you describe seems largely an aesthetic concern,
> > > which was not evident to me by the segmentation alone.
> >
> > Sure.
> >
> > >
> > > > struct vfio_device_file {
> > > >         struct vfio_device *device;
> > > >         struct vfio_group *group;
> > > >
> > > >         bool access_granted;
> > > >         spinlock_t kvm_ref_lock; /* protect kvm field */
> > > >         struct kvm *kvm;
> > > >         struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> > > >         u32 devid; /* only valid when iommufd is valid */
> > > > };
> > > >
> > > > >
> > > > > I think this is being placed towards the front of the data structure
> > > > > for cache line locality given this is a hot path for file operations.
> > > > > But bool types have an implementation dependent size, making them
> > > > > difficult to pack.  Also there will be a tendency to want to make this
> > > > > a bit field, which is probably not compatible with the smp lockless
> > > > > operations being used here.  We might get in front of these issues if
> > > > > we just define it as a u8 now.  Thanks,
> > > >
> > > > Not quite get why bit field is going to be incompatible with smp
> > > > lockless operations. Could you elaborate a bit? And should I define
> > > > the access_granted as u8 or "u8:1"?
> > >
> > > Perhaps FUD on my part, but load-acquire type operations have specific
> > > semantics and it's not clear to me that they interest with compiler
> > > generated bit operations.  Thanks,
> >
> > I see. How about below?
> >
> > struct vfio_device_file {
> >         struct vfio_device *device;
> >         struct vfio_group *group;
> >         u8 access_granted;
> >         u32 devid; /* only valid when iommufd is valid */
> >         spinlock_t kvm_ref_lock; /* protect kvm field */
> >         struct kvm *kvm;
> >         struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> > };
> 
> Yep, that's essentially what I was suggesting.  Thanks,

Got it. 
Jason Gunthorpe June 13, 2023, 5:19 p.m. UTC | #7
On Tue, Jun 13, 2023 at 08:16:47AM -0600, Alex Williamson wrote:

> > Not quite get why bit field is going to be incompatible with smp
> > lockless operations. Could you elaborate a bit? And should I define
> > the access_granted as u8 or "u8:1"?
> 
> Perhaps FUD on my part, but load-acquire type operations have specific
> semantics and it's not clear to me that they interest with compiler
> generated bit operations.  Thanks,

They won't compile if you target bit ops, you can't take the address
of a bitfield.

Jason
Alex Williamson June 13, 2023, 5:31 p.m. UTC | #8
On Tue, 13 Jun 2023 14:19:17 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Tue, Jun 13, 2023 at 08:16:47AM -0600, Alex Williamson wrote:
> 
> > > Not quite get why bit field is going to be incompatible with smp
> > > lockless operations. Could you elaborate a bit? And should I define
> > > the access_granted as u8 or "u8:1"?  
> > 
> > Perhaps FUD on my part, but load-acquire type operations have specific
> > semantics and it's not clear to me that they interest with compiler
> > generated bit operations.  Thanks,  
> 
> They won't compile if you target bit ops, you can't take the address
> of a bitfield.

Yup, that's what I was assuming but was too lazy to prove it.  Thanks,

Alex
diff mbox series

Patch

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index caf53716ddb2..088dd34c8931 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -194,9 +194,18 @@  static int vfio_df_group_open(struct vfio_device_file *df)
 	df->iommufd = device->group->iommufd;
 
 	ret = vfio_df_open(df);
-	if (ret)
+	if (ret) {
 		df->iommufd = NULL;
+		goto out_put_kvm;
+	}
+
+	/*
+	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
+	 * read/write/mmap and vfio_file_has_device_access()
+	 */
+	smp_store_release(&df->access_granted, true);
 
+out_put_kvm:
 	if (device->open_count == 0)
 		vfio_device_put_kvm(device);
 
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index f9eb52eb9ed7..fdf2fc73f880 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -18,6 +18,7 @@  struct vfio_container;
 
 struct vfio_device_file {
 	struct vfio_device *device;
+	bool access_granted;
 	spinlock_t kvm_ref_lock; /* protect kvm field */
 	struct kvm *kvm;
 	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index a3c5817fc545..4c8b7713dc3d 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1129,6 +1129,10 @@  static long vfio_device_fops_unl_ioctl(struct file *filep,
 	struct vfio_device *device = df->device;
 	int ret;
 
+	/* Paired with smp_store_release() following vfio_df_open() */
+	if (!smp_load_acquire(&df->access_granted))
+		return -EINVAL;
+
 	ret = vfio_device_pm_runtime_get(device);
 	if (ret)
 		return ret;
@@ -1156,6 +1160,10 @@  static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
+	/* Paired with smp_store_release() following vfio_df_open() */
+	if (!smp_load_acquire(&df->access_granted))
+		return -EINVAL;
+
 	if (unlikely(!device->ops->read))
 		return -EINVAL;
 
@@ -1169,6 +1177,10 @@  static ssize_t vfio_device_fops_write(struct file *filep,
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
+	/* Paired with smp_store_release() following vfio_df_open() */
+	if (!smp_load_acquire(&df->access_granted))
+		return -EINVAL;
+
 	if (unlikely(!device->ops->write))
 		return -EINVAL;
 
@@ -1180,6 +1192,10 @@  static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
+	/* Paired with smp_store_release() following vfio_df_open() */
+	if (!smp_load_acquire(&df->access_granted))
+		return -EINVAL;
+
 	if (unlikely(!device->ops->mmap))
 		return -EINVAL;