Message ID | 20210726143524.155779-3-hch@lst.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] vfio/mdev: turn mdev_init into a subsys_initcall | expand |
On Mon, Jul 26 2021, Christoph Hellwig <hch@lst.de> wrote: > Only a single driver actually sets the ->request method, so don't print > a scary warning if it isn't. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > drivers/vfio/mdev/mdev_core.c | 4 ---- > 1 file changed, 4 deletions(-) > > diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c > index b16606ebafa1..b314101237fe 100644 > --- a/drivers/vfio/mdev/mdev_core.c > +++ b/drivers/vfio/mdev/mdev_core.c > @@ -138,10 +138,6 @@ int mdev_register_device(struct device *dev, const struct mdev_parent_ops *ops) > if (!dev) > return -EINVAL; > > - /* Not mandatory, but its absence could be a problem */ > - if (!ops->request) > - dev_info(dev, "Driver cannot be asked to release device\n"); > - > mutex_lock(&parent_list_lock); > > /* Check for duplicate */ We also log a warning if we would like to call ->request() but none was provided, so I think that's fine. Reviewed-by: Cornelia Huck <cohuck@redhat.com> But I wonder why nobody else implements this? Lack of surprise removal?
On Mon, Jul 26, 2021 at 04:35:24PM +0200, Christoph Hellwig wrote: > Only a single driver actually sets the ->request method, so don't print > a scary warning if it isn't. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > drivers/vfio/mdev/mdev_core.c | 4 ---- > 1 file changed, 4 deletions(-) Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
On Mon, Jul 26, 2021 at 07:07:04PM +0200, Cornelia Huck wrote:
> But I wonder why nobody else implements this? Lack of surprise removal?
The only implementation triggers an eventfd that seems to be the same
eventfd as the interrupt..
Do you know how this works in userspace? I'm surprised that the
interrupt eventfd can trigger an observation that the kernel driver
wants to be unplugged?
Jason
On Mon, 26 Jul 2021 20:09:06 -0300 Jason Gunthorpe <jgg@nvidia.com> wrote: > On Mon, Jul 26, 2021 at 07:07:04PM +0200, Cornelia Huck wrote: > > > But I wonder why nobody else implements this? Lack of surprise removal? > > The only implementation triggers an eventfd that seems to be the same > eventfd as the interrupt.. > > Do you know how this works in userspace? I'm surprised that the > interrupt eventfd can trigger an observation that the kernel driver > wants to be unplugged? I think we're talking about ccw, but I see QEMU registering separate eventfds for each of the 3 IRQ indexes and the mdev driver specifically triggering the req_trigger...? Thanks, Alex
On Mon, Jul 26 2021, Alex Williamson <alex.williamson@redhat.com> wrote: > On Mon, 26 Jul 2021 20:09:06 -0300 > Jason Gunthorpe <jgg@nvidia.com> wrote: > >> On Mon, Jul 26, 2021 at 07:07:04PM +0200, Cornelia Huck wrote: >> >> > But I wonder why nobody else implements this? Lack of surprise removal? >> >> The only implementation triggers an eventfd that seems to be the same >> eventfd as the interrupt.. >> >> Do you know how this works in userspace? I'm surprised that the >> interrupt eventfd can trigger an observation that the kernel driver >> wants to be unplugged? > > I think we're talking about ccw, but I see QEMU registering separate > eventfds for each of the 3 IRQ indexes and the mdev driver specifically > triggering the req_trigger...? Thanks, > > Alex Exactly, ccw has a trigger for normal I/O interrupts, CRW (machine checks), and this one.
On Tue, Jul 27, 2021 at 08:04:16AM +0200, Cornelia Huck wrote: > On Mon, Jul 26 2021, Alex Williamson <alex.williamson@redhat.com> wrote: > > > On Mon, 26 Jul 2021 20:09:06 -0300 > > Jason Gunthorpe <jgg@nvidia.com> wrote: > > > >> On Mon, Jul 26, 2021 at 07:07:04PM +0200, Cornelia Huck wrote: > >> > >> > But I wonder why nobody else implements this? Lack of surprise removal? > >> > >> The only implementation triggers an eventfd that seems to be the same > >> eventfd as the interrupt.. > >> > >> Do you know how this works in userspace? I'm surprised that the > >> interrupt eventfd can trigger an observation that the kernel driver > >> wants to be unplugged? > > > > I think we're talking about ccw, but I see QEMU registering separate > > eventfds for each of the 3 IRQ indexes and the mdev driver specifically > > triggering the req_trigger...? Thanks, > > > > Alex > > Exactly, ccw has a trigger for normal I/O interrupts, CRW (machine > checks), and this one. If it is a dedicated eventfd for 'device being removed' why is it in the CCW implementation and not core code? Is PCI doing the same? Jason
On Tue, 27 Jul 2021 14:32:09 -0300 Jason Gunthorpe <jgg@nvidia.com> wrote: > On Tue, Jul 27, 2021 at 08:04:16AM +0200, Cornelia Huck wrote: > > On Mon, Jul 26 2021, Alex Williamson <alex.williamson@redhat.com> wrote: > > > > > On Mon, 26 Jul 2021 20:09:06 -0300 > > > Jason Gunthorpe <jgg@nvidia.com> wrote: > > > > > >> On Mon, Jul 26, 2021 at 07:07:04PM +0200, Cornelia Huck wrote: > > >> > > >> > But I wonder why nobody else implements this? Lack of surprise removal? > > >> > > >> The only implementation triggers an eventfd that seems to be the same > > >> eventfd as the interrupt.. > > >> > > >> Do you know how this works in userspace? I'm surprised that the > > >> interrupt eventfd can trigger an observation that the kernel driver > > >> wants to be unplugged? > > > > > > I think we're talking about ccw, but I see QEMU registering separate > > > eventfds for each of the 3 IRQ indexes and the mdev driver specifically > > > triggering the req_trigger...? Thanks, > > > > > > Alex > > > > Exactly, ccw has a trigger for normal I/O interrupts, CRW (machine > > checks), and this one. > > If it is a dedicated eventfd for 'device being removed' why is it in > the CCW implementation and not core code? The CCW implementation (likewise the vfio-pci implementation) owns the IRQ index address space and the decision to make this a signal to userspace rather than perhaps some handling a device might be able to do internally. For instance an alternate vfio-pci implementation might zap all mmaps, block all r/w access, and turn this into a surprise removal. Another implementation might be more aggressive to sending SIGKILL to the user process. This was the thought behind why vfio-core triggers the driver request callback with a counter, leaving the policy to the driver. > Is PCI doing the same? Yes, that's where this handling originated. Thanks, Alex
On Tue, Jul 27, 2021 at 12:53:09PM -0600, Alex Williamson wrote: > On Tue, 27 Jul 2021 14:32:09 -0300 > Jason Gunthorpe <jgg@nvidia.com> wrote: > > > On Tue, Jul 27, 2021 at 08:04:16AM +0200, Cornelia Huck wrote: > > > On Mon, Jul 26 2021, Alex Williamson <alex.williamson@redhat.com> wrote: > > > > > > > On Mon, 26 Jul 2021 20:09:06 -0300 > > > > Jason Gunthorpe <jgg@nvidia.com> wrote: > > > > > > > >> On Mon, Jul 26, 2021 at 07:07:04PM +0200, Cornelia Huck wrote: > > > >> > > > >> > But I wonder why nobody else implements this? Lack of surprise removal? > > > >> > > > >> The only implementation triggers an eventfd that seems to be the same > > > >> eventfd as the interrupt.. > > > >> > > > >> Do you know how this works in userspace? I'm surprised that the > > > >> interrupt eventfd can trigger an observation that the kernel driver > > > >> wants to be unplugged? > > > > > > > > I think we're talking about ccw, but I see QEMU registering separate > > > > eventfds for each of the 3 IRQ indexes and the mdev driver specifically > > > > triggering the req_trigger...? Thanks, > > > > > > > > Alex > > > > > > Exactly, ccw has a trigger for normal I/O interrupts, CRW (machine > > > checks), and this one. > > > > If it is a dedicated eventfd for 'device being removed' why is it in > > the CCW implementation and not core code? > > The CCW implementation (likewise the vfio-pci implementation) owns > the IRQ index address space and the decision to make this a signal > to userspace rather than perhaps some handling a device might be > able to do internally. The core code holds the vfio_device_get() so long as the FD is open. There is no way to pass the wait_for_completion without userspace closing the FD, so there isn't really much choice for the drivers to do beyond signal to userpace to close the FD?? > For instance an alternate vfio-pci implementation might zap all > mmaps, block all r/w access, and turn this into a surprise removal. This is nice, but wouldn't close the FD, so needs core changes anyhow.. > Another implementation might be more aggressive to sending SIGKILL > to the user process. We don't try to revoke FDs from the kernel, it is racy, dangerous and unreliable. > This was the thought behind why vfio-core triggers the driver > request callback with a counter, leaving the policy to the driver. IMHO subsystem policy does not belong in drivers. Down that road lies a mess for userspace. Jason
On Tue, 27 Jul 2021 16:03:17 -0300 Jason Gunthorpe <jgg@nvidia.com> wrote: > On Tue, Jul 27, 2021 at 12:53:09PM -0600, Alex Williamson wrote: > > On Tue, 27 Jul 2021 14:32:09 -0300 > > Jason Gunthorpe <jgg@nvidia.com> wrote: > > > > > On Tue, Jul 27, 2021 at 08:04:16AM +0200, Cornelia Huck wrote: > > > > On Mon, Jul 26 2021, Alex Williamson <alex.williamson@redhat.com> wrote: > > > > > > > > > On Mon, 26 Jul 2021 20:09:06 -0300 > > > > > Jason Gunthorpe <jgg@nvidia.com> wrote: > > > > > > > > > >> On Mon, Jul 26, 2021 at 07:07:04PM +0200, Cornelia Huck wrote: > > > > >> > > > > >> > But I wonder why nobody else implements this? Lack of surprise removal? > > > > >> > > > > >> The only implementation triggers an eventfd that seems to be the same > > > > >> eventfd as the interrupt.. > > > > >> > > > > >> Do you know how this works in userspace? I'm surprised that the > > > > >> interrupt eventfd can trigger an observation that the kernel driver > > > > >> wants to be unplugged? > > > > > > > > > > I think we're talking about ccw, but I see QEMU registering separate > > > > > eventfds for each of the 3 IRQ indexes and the mdev driver specifically > > > > > triggering the req_trigger...? Thanks, > > > > > > > > > > Alex > > > > > > > > Exactly, ccw has a trigger for normal I/O interrupts, CRW (machine > > > > checks), and this one. > > > > > > If it is a dedicated eventfd for 'device being removed' why is it in > > > the CCW implementation and not core code? > > > > The CCW implementation (likewise the vfio-pci implementation) owns > > the IRQ index address space and the decision to make this a signal > > to userspace rather than perhaps some handling a device might be > > able to do internally. > > The core code holds the vfio_device_get() so long as the FD is > open. There is no way to pass the wait_for_completion without > userspace closing the FD, so there isn't really much choice for the > drivers to do beyond signal to userpace to close the FD?? > > > For instance an alternate vfio-pci implementation might zap all > > mmaps, block all r/w access, and turn this into a surprise removal. > > This is nice, but wouldn't close the FD, so needs core changes > anyhow.. Right, the core would need to be able to handle an FD disconnected from the device, obviously some core changes would be required. > > Another implementation might be more aggressive to sending SIGKILL > > to the user process. > > We don't try to revoke FDs from the kernel, it is racy, dangerous and > unreliable. I'm not sure how trying to kill the process using an open file becomes a revoke... In fact, the surprise hotplug might just be able to zap mmaps and wait for userspace to generate a SIGBUS. > > This was the thought behind why vfio-core triggers the driver > > request callback with a counter, leaving the policy to the driver. > > IMHO subsystem policy does not belong in drivers. Down that road lies > a mess for userspace. I think my argument was that to this point it's been driver policy, not subsystem policy. The subsystem policy is to block until the device is released, it's the driver policy whether it has a means to implement something to expedite that. Thanks, Alex
diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c index b16606ebafa1..b314101237fe 100644 --- a/drivers/vfio/mdev/mdev_core.c +++ b/drivers/vfio/mdev/mdev_core.c @@ -138,10 +138,6 @@ int mdev_register_device(struct device *dev, const struct mdev_parent_ops *ops) if (!dev) return -EINVAL; - /* Not mandatory, but its absence could be a problem */ - if (!ops->request) - dev_info(dev, "Driver cannot be asked to release device\n"); - mutex_lock(&parent_list_lock); /* Check for duplicate */
Only a single driver actually sets the ->request method, so don't print a scary warning if it isn't. Signed-off-by: Christoph Hellwig <hch@lst.de> --- drivers/vfio/mdev/mdev_core.c | 4 ---- 1 file changed, 4 deletions(-)