Message ID | e7d35d7730f3f83417e757bc264a470f8c2671ed.1706849424.git.reinette.chatre@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vfio/pci: Remove duplicate code and logic from VFIO PCI interrupt management | expand |
On Thu, 1 Feb 2024 20:56:57 -0800 Reinette Chatre <reinette.chatre@intel.com> wrote: > vfio_pci_set_irqs_ioctl() is the entrypoint for interrupt management > via the VFIO_DEVICE_SET_IRQS ioctl(). The igate mutex is obtained > before calling vfio_pci_set_irqs_ioctl() for management of all interrupt > types to protect against concurrent changes to the eventfds associated > with device request notification and error interrupts. > > The igate mutex is not acquired consistently. The mutex is always > (for all interrupt types) acquired from within vfio_pci_ioctl_set_irqs() > before calling vfio_pci_set_irqs_ioctl(), but vfio_pci_set_irqs_ioctl() is > called via vfio_pci_core_disable() without the mutex held. The latter > is expected to be correct if the code flow can be guaranteed that > the provided interrupt type is not a device request notification or error > interrupt. The latter is correct because it's always a physical interrupt type (INTx/MSI/MSIX), vdev->irq_type dictates this, and the interrupt code prevents the handler from being called after the interrupt is disabled. It's intentional that we don't acquire igate here since we only need to prevent a race with concurrent user access, which cannot occur in the fd release path. The igate mutex is acquired consistently, where it's required. It would be more forthcoming to describe that potential future emulated device interrupts don't make the same guarantees, but if that's true, why can't they? > Move igate mutex acquire and release into vfio_pci_set_irqs_ioctl() > to make the locking consistent irrespective of interrupt type. > This is one step closer to contain the interrupt management locking > internals within the interrupt management code so that the VFIO PCI > core can trigger management of the eventfds associated with device > request notification and error interrupts without needing to access > and manipulate VFIO interrupt management locks and data. If all we want to do is move the mutex into vfio_pci_intr.c then we could rename to __vfio_pci_set_irqs_ioctl() and create a wrapper around it that grabs the mutex. The disable path could use the lockless version and we wouldn't need to clutter the exit path unlocking the mutex as done below. Thanks, Alex > Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> > --- > Note to maintainers: > Originally formed part of the IMS submission below, but is not > specific to IMS. > https://lore.kernel.org/lkml/cover.1696609476.git.reinette.chatre@intel.com > > drivers/vfio/pci/vfio_pci_core.c | 3 --- > drivers/vfio/pci/vfio_pci_intrs.c | 10 ++++++++-- > 2 files changed, 8 insertions(+), 5 deletions(-) > > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c > index 1cbc990d42e0..d2847ca2f0cb 100644 > --- a/drivers/vfio/pci/vfio_pci_core.c > +++ b/drivers/vfio/pci/vfio_pci_core.c > @@ -1214,12 +1214,9 @@ static int vfio_pci_ioctl_set_irqs(struct vfio_pci_core_device *vdev, > return PTR_ERR(data); > } > > - mutex_lock(&vdev->igate); > - > ret = vfio_pci_set_irqs_ioctl(vdev, hdr.flags, hdr.index, hdr.start, > hdr.count, data); > > - mutex_unlock(&vdev->igate); > kfree(data); > > return ret; > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c > index 69ab11863282..97a3bb22b186 100644 > --- a/drivers/vfio/pci/vfio_pci_intrs.c > +++ b/drivers/vfio/pci/vfio_pci_intrs.c > @@ -793,7 +793,9 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags, > int (*func)(struct vfio_pci_core_device *vdev, unsigned int index, > unsigned int start, unsigned int count, uint32_t flags, > void *data) = NULL; > + int ret = -ENOTTY; > > + mutex_lock(&vdev->igate); > switch (index) { > case VFIO_PCI_INTX_IRQ_INDEX: > switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) { > @@ -838,7 +840,11 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags, > } > > if (!func) > - return -ENOTTY; > + goto out_unlock; > + > + ret = func(vdev, index, start, count, flags, data); > +out_unlock: > + mutex_unlock(&vdev->igate); > + return ret; > > - return func(vdev, index, start, count, flags, data); > }
Hi Alex, On 2/5/2024 2:34 PM, Alex Williamson wrote: > On Thu, 1 Feb 2024 20:56:57 -0800 > Reinette Chatre <reinette.chatre@intel.com> wrote: > >> vfio_pci_set_irqs_ioctl() is the entrypoint for interrupt management >> via the VFIO_DEVICE_SET_IRQS ioctl(). The igate mutex is obtained >> before calling vfio_pci_set_irqs_ioctl() for management of all interrupt >> types to protect against concurrent changes to the eventfds associated >> with device request notification and error interrupts. >> >> The igate mutex is not acquired consistently. The mutex is always >> (for all interrupt types) acquired from within vfio_pci_ioctl_set_irqs() >> before calling vfio_pci_set_irqs_ioctl(), but vfio_pci_set_irqs_ioctl() is >> called via vfio_pci_core_disable() without the mutex held. The latter >> is expected to be correct if the code flow can be guaranteed that >> the provided interrupt type is not a device request notification or error >> interrupt. > > The latter is correct because it's always a physical interrupt type > (INTx/MSI/MSIX), vdev->irq_type dictates this, and the interrupt code > prevents the handler from being called after the interrupt is disabled. Thank you for confirming. > It's intentional that we don't acquire igate here since we only need to > prevent a race with concurrent user access, which cannot occur in the > fd release path. The igate mutex is acquired consistently, where it's > required. Thank you. I do think it will be helpful to document some of this in the code to help newcomers distinguish the scenarios (more below). > It would be more forthcoming to describe that potential future emulated > device interrupts don't make the same guarantees, but if that's true, > why can't they? As I understand an emulated interrupt will be triggered by VFIO PCI driver as a result from, for example, a mmio write from user space. I thus expect similar locking to existing device request notification and error interrupts. I would like to focus this series on existing flows though. >> Move igate mutex acquire and release into vfio_pci_set_irqs_ioctl() >> to make the locking consistent irrespective of interrupt type. >> This is one step closer to contain the interrupt management locking >> internals within the interrupt management code so that the VFIO PCI >> core can trigger management of the eventfds associated with device >> request notification and error interrupts without needing to access >> and manipulate VFIO interrupt management locks and data. > > If all we want to do is move the mutex into vfio_pci_intr.c then we > could rename to __vfio_pci_set_irqs_ioctl() and create a wrapper around > it that grabs the mutex. The disable path could use the lockless > version and we wouldn't need to clutter the exit path unlocking the > mutex as done below. Thanks, Will do. This creates an opportunity to document the flows involving the mutex (essentially adding comments that includes your description above). Reinette
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 1cbc990d42e0..d2847ca2f0cb 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -1214,12 +1214,9 @@ static int vfio_pci_ioctl_set_irqs(struct vfio_pci_core_device *vdev, return PTR_ERR(data); } - mutex_lock(&vdev->igate); - ret = vfio_pci_set_irqs_ioctl(vdev, hdr.flags, hdr.index, hdr.start, hdr.count, data); - mutex_unlock(&vdev->igate); kfree(data); return ret; diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 69ab11863282..97a3bb22b186 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -793,7 +793,9 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags, int (*func)(struct vfio_pci_core_device *vdev, unsigned int index, unsigned int start, unsigned int count, uint32_t flags, void *data) = NULL; + int ret = -ENOTTY; + mutex_lock(&vdev->igate); switch (index) { case VFIO_PCI_INTX_IRQ_INDEX: switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) { @@ -838,7 +840,11 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags, } if (!func) - return -ENOTTY; + goto out_unlock; + + ret = func(vdev, index, start, count, flags, data); +out_unlock: + mutex_unlock(&vdev->igate); + return ret; - return func(vdev, index, start, count, flags, data); }
vfio_pci_set_irqs_ioctl() is the entrypoint for interrupt management via the VFIO_DEVICE_SET_IRQS ioctl(). The igate mutex is obtained before calling vfio_pci_set_irqs_ioctl() for management of all interrupt types to protect against concurrent changes to the eventfds associated with device request notification and error interrupts. The igate mutex is not acquired consistently. The mutex is always (for all interrupt types) acquired from within vfio_pci_ioctl_set_irqs() before calling vfio_pci_set_irqs_ioctl(), but vfio_pci_set_irqs_ioctl() is called via vfio_pci_core_disable() without the mutex held. The latter is expected to be correct if the code flow can be guaranteed that the provided interrupt type is not a device request notification or error interrupt. Move igate mutex acquire and release into vfio_pci_set_irqs_ioctl() to make the locking consistent irrespective of interrupt type. This is one step closer to contain the interrupt management locking internals within the interrupt management code so that the VFIO PCI core can trigger management of the eventfds associated with device request notification and error interrupts without needing to access and manipulate VFIO interrupt management locks and data. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> --- Note to maintainers: Originally formed part of the IMS submission below, but is not specific to IMS. https://lore.kernel.org/lkml/cover.1696609476.git.reinette.chatre@intel.com drivers/vfio/pci/vfio_pci_core.c | 3 --- drivers/vfio/pci/vfio_pci_intrs.c | 10 ++++++++-- 2 files changed, 8 insertions(+), 5 deletions(-)