Message ID | 3-v1-ef00ffecea52+2cb-iommu_group_lifetime_jgg@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix splats releated to using the iommu_group after destroying devices | expand |
On 9/8/22 2:45 PM, Jason Gunthorpe wrote: > The iommu_group comes from the struct device that a driver has been bound > to and then created a struct vfio_device against. To keep the iommu layer > sane we want to have a simple rule that only an attached driver should be > using the iommu API. Particularly only an attached driver should hold > ownership. > > In VFIO's case since it uses the group APIs and it shares between > different drivers it is a bit more complicated, but the principle still > holds. > > Solve this by waiting for all users of the vfio_group to stop before > allowing vfio_unregister_group_dev() to complete. This is done with a new > completion to know when the users go away and an additional refcount to > keep track of how many device drivers are sharing the vfio group. The last > driver to be unregistered will clean up the group. > > This solves crashes in the S390 iommu driver that come because VFIO ends > up racing releasing ownership (which attaches the default iommu_domain to > the device) with the removal of that same device from the iommu > driver. This is a side case that iommu drivers should not have to cope > with. > > iommu driver failed to attach the default/blocking domain > WARNING: CPU: 0 PID: 5082 at drivers/iommu/iommu.c:1961 iommu_detach_group+0x6c/0x80 > Modules linked in: macvtap macvlan tap vfio_pci vfio_pci_core irqbypass vfio_virqfd kvm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink mlx5_ib sunrpc ib_uverbs ism smc uvdevice ib_core s390_trng eadm_sch tape_3590 tape tape_class vfio_ccw mdev vfio_iommu_type1 vfio zcrypt_cex4 sch_fq_codel configfs ghash_s390 prng chacha_s390 libchacha aes_s390 mlx5_core des_s390 libdes sha3_512_s390 nvme sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common nvme_core zfcp scsi_transport_fc pkey zcrypt rng_core autofs4 > CPU: 0 PID: 5082 Comm: qemu-system-s39 Tainted: G W 6.0.0-rc3 #5 > Hardware name: IBM 3931 A01 782 (LPAR) > Krnl PSW : 0704c00180000000 000000095bb10d28 (iommu_detach_group+0x70/0x80) > R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 > Krnl GPRS: 0000000000000001 0000000900000027 0000000000000039 000000095c97ffe0 > 00000000fffeffff 00000009fc290000 00000000af1fda50 00000000af590b58 > 00000000af1fdaf0 0000000135c7a320 0000000135e52258 0000000135e52200 > 00000000a29e8000 00000000af590b40 000000095bb10d24 0000038004b13c98 > Krnl Code: 000000095bb10d18: c020003d56fc larl %r2,000000095c2bbb10 > 000000095bb10d1e: c0e50019d901 brasl %r14,000000095be4bf20 > #000000095bb10d24: af000000 mc 0,0 > >000000095bb10d28: b904002a lgr %r2,%r10 > 000000095bb10d2c: ebaff0a00004 lmg %r10,%r15,160(%r15) > 000000095bb10d32: c0f4001aa867 brcl 15,000000095be65e00 > 000000095bb10d38: c004002168e0 brcl 0,000000095bf3def8 > 000000095bb10d3e: eb6ff0480024 stmg %r6,%r15,72(%r15) > Call Trace: > [<000000095bb10d28>] iommu_detach_group+0x70/0x80 > ([<000000095bb10d24>] iommu_detach_group+0x6c/0x80) > [<000003ff80243b0e>] vfio_iommu_type1_detach_group+0x136/0x6c8 [vfio_iommu_type1] > [<000003ff80137780>] __vfio_group_unset_container+0x58/0x158 [vfio] > [<000003ff80138a16>] vfio_group_fops_unl_ioctl+0x1b6/0x210 [vfio] > pci 0004:00:00.0: Removing from iommu group 4 > [<000000095b5b62e8>] __s390x_sys_ioctl+0xc0/0x100 > [<000000095be5d3b4>] __do_syscall+0x1d4/0x200 > [<000000095be6c072>] system_call+0x82/0xb0 > Last Breaking-Event-Address: > [<000000095be4bf80>] __warn_printk+0x60/0x68 > > It reflects that domain->ops->attach_dev() failed because the driver has > already passed the point of destructing the device. > > Fixes: 9ac8545199a1 ("iommu: Fix use-after-free in iommu_release_device") > Reported-by: Matthew Rosato <mjrosato@linux.ibm.com> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> I've been running with only the first 3 patches in this series (the vfio changes) and can confirm that they resolve the reported issue for me. Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> # s390 ... > +static void vfio_group_remove(struct vfio_group *group) > +{ > + /* Pairs with vfio_create_group() */ Nit: vfio_create_group() no longer exists as of patch 1
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index ba8b6bed12c7e7..3bd6ec4cdd5b26 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -66,7 +66,15 @@ struct vfio_container { struct vfio_group { struct device dev; struct cdev cdev; + /* + * When drivers is non-zero a driver is attached to the struct device + * that provided the iommu_group and thus the iommu_group is a valid + * pointer. When drivers is 0 the driver is being detached. Once users + * reaches 0 then the iommu_group is invalid. + */ + refcount_t drivers; refcount_t users; + struct completion users_comp; unsigned int container_users; struct iommu_group *iommu_group; struct vfio_container *container; @@ -276,8 +284,6 @@ void vfio_unregister_iommu_driver(const struct vfio_iommu_driver_ops *ops) } EXPORT_SYMBOL_GPL(vfio_unregister_iommu_driver); -static void vfio_group_get(struct vfio_group *group); - /* * Container objects - containers are created when /dev/vfio/vfio is * opened, but their lifecycle extends until the last user is done, so @@ -353,6 +359,8 @@ static struct vfio_group *vfio_group_alloc(struct iommu_group *iommu_group, group->cdev.owner = THIS_MODULE; refcount_set(&group->users, 1); + refcount_set(&group->drivers, 1); + init_completion(&group->users_comp); init_rwsem(&group->group_rwsem); INIT_LIST_HEAD(&group->device_list); mutex_init(&group->device_lock); @@ -401,7 +409,7 @@ static struct vfio_group *vfio_get_group(struct device *dev, goto out_unlock; } /* Found an existing group */ - vfio_group_get(ret); + refcount_inc(&ret->drivers); goto out_unlock; } @@ -437,8 +445,36 @@ static struct vfio_group *vfio_get_group(struct device *dev, static void vfio_group_put(struct vfio_group *group) { - if (!refcount_dec_and_mutex_lock(&group->users, &vfio.group_lock)) + if (refcount_dec_and_test(&group->users)) + complete(&group->users_comp); +} + +static void vfio_group_remove(struct vfio_group *group) +{ + /* Pairs with vfio_create_group() */ + if (!refcount_dec_and_mutex_lock(&group->drivers, &vfio.group_lock)) return; + list_del(&group->vfio_next); + + /* + * We could concurrently probe another driver in the group that might + * race vfio_group_remove() with vfio_get_group(), so we have to ensure + * that the sysfs is all cleaned up under lock otherwise the + * cdev_device_add() will fail due to the name aready existing. + */ + cdev_device_del(&group->cdev, &group->dev); + mutex_unlock(&vfio.group_lock); + + /* Matches the get from vfio_group_alloc() */ + vfio_group_put(group); + + /* + * Before we allow the last driver in the group to be unplugged the + * group must be sanitized so nothing else is or can reference it. This + * is because the group->iommu_group pointer should only be used so long + * as a device driver is attached to a device in the group. + */ + wait_for_completion(&group->users_comp); /* * These data structures all have paired operations that can only be @@ -449,19 +485,11 @@ static void vfio_group_put(struct vfio_group *group) WARN_ON(!list_empty(&group->device_list)); WARN_ON(group->container || group->container_users); WARN_ON(group->notifier.head); - - list_del(&group->vfio_next); - cdev_device_del(&group->cdev, &group->dev); - mutex_unlock(&vfio.group_lock); + group->iommu_group = NULL; put_device(&group->dev); } -static void vfio_group_get(struct vfio_group *group) -{ - refcount_inc(&group->users); -} - /* * Device objects - create, release, get, put, search */ @@ -573,6 +601,10 @@ static struct vfio_group *vfio_group_find_or_alloc(struct device *dev) static int __vfio_register_dev(struct vfio_device *device, struct vfio_group *group) { + /* + * In all cases group is the output of one of the group allocation + * functions and we have group->drivers incremented for us. + */ if (IS_ERR(group)) return PTR_ERR(group); @@ -683,8 +715,7 @@ void vfio_unregister_group_dev(struct vfio_device *device) if (group->type == VFIO_NO_IOMMU || group->type == VFIO_EMULATED_IOMMU) iommu_group_remove_device(device->dev); - /* Matches the get in vfio_register_group_dev() */ - vfio_group_put(group); + vfio_group_remove(group); } EXPORT_SYMBOL_GPL(vfio_unregister_group_dev); @@ -1272,7 +1303,7 @@ static int vfio_group_fops_open(struct inode *inode, struct file *filep) down_write(&group->group_rwsem); - /* users can be zero if this races with vfio_group_put() */ + /* users can be zero if this races with vfio_group_remove() */ if (!refcount_inc_not_zero(&group->users)) { ret = -ENODEV; goto err_unlock;