Message ID | 1665222631-202970-1-git-send-email-chenxiang66@hisilicon.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Revert 'vfio: Delete container_q' | expand |
On Sat, Oct 08, 2022 at 05:50:31PM +0800, chenxiang wrote: > From: Xiang Chen <chenxiang66@hisilicon.com> > > We find a issue on ARM64 platform with HNS3 VF SRIOV enabled (VFIO > passthrough in qemu): > kill the qemu thread, then echo 0 > sriov_numvfs to disable sriov > immediately, sometimes we will see following warnings: I suspect this is fixed in vfio-next now, in a different way. Please check > After removing container_q, arm_smmu_release_dev() caused by disabling > sriov may occur before arm_smmuv3_attach_dev() called by echo 0 > sriov_numvfs, > and arm_smmu_attach_dev() may refer to freed iommu_fwspec, so it causes > above warnings. Which is the same effective issue s390 hit already. It is interesting that container_q was solving this, that seems to be a inadverent side effect nobody noticed. Jason
Hi Jason, 在 2022/10/11 2:27, Jason Gunthorpe 写道: > On Sat, Oct 08, 2022 at 05:50:31PM +0800, chenxiang wrote: >> From: Xiang Chen <chenxiang66@hisilicon.com> >> >> We find a issue on ARM64 platform with HNS3 VF SRIOV enabled (VFIO >> passthrough in qemu): >> kill the qemu thread, then echo 0 > sriov_numvfs to disable sriov >> immediately, sometimes we will see following warnings: > I suspect this is fixed in vfio-next now, in a different way. Please check Can you point out which patches fix it? I need to merge back those patches to our version, then have a test. > >> After removing container_q, arm_smmu_release_dev() caused by disabling >> sriov may occur before arm_smmuv3_attach_dev() called by echo 0 > sriov_numvfs, >> and arm_smmu_attach_dev() may refer to freed iommu_fwspec, so it causes >> above warnings. > Which is the same effective issue s390 hit already. > > It is interesting that container_q was solving this, that seems to be > a inadverent side effect nobody noticed. > > Jason > . >
> From: chenxiang (M) <chenxiang66@hisilicon.com> > Sent: Tuesday, October 11, 2022 10:31 AM > > Hi Jason, > > > 在 2022/10/11 2:27, Jason Gunthorpe 写道: > > On Sat, Oct 08, 2022 at 05:50:31PM +0800, chenxiang wrote: > >> From: Xiang Chen <chenxiang66@hisilicon.com> > >> > >> We find a issue on ARM64 platform with HNS3 VF SRIOV enabled (VFIO > >> passthrough in qemu): > >> kill the qemu thread, then echo 0 > sriov_numvfs to disable sriov > >> immediately, sometimes we will see following warnings: > > I suspect this is fixed in vfio-next now, in a different way. Please check > > Can you point out which patches fix it? > I need to merge back those patches to our version, then have a test. > commit ca5f21b2574903a7430fcb3590e534d92b2fa816 Author: Jason Gunthorpe <jgg@nvidia.com> Date: Thu Sep 22 21:06:10 2022 -0300 vfio: Follow a strict lifetime for struct iommu_group
Hi, 在 2022/10/11 10:49, Tian, Kevin 写道: >> From: chenxiang (M) <chenxiang66@hisilicon.com> >> Sent: Tuesday, October 11, 2022 10:31 AM >> >> Hi Jason, >> >> >> 在 2022/10/11 2:27, Jason Gunthorpe 写道: >>> On Sat, Oct 08, 2022 at 05:50:31PM +0800, chenxiang wrote: >>>> From: Xiang Chen <chenxiang66@hisilicon.com> >>>> >>>> We find a issue on ARM64 platform with HNS3 VF SRIOV enabled (VFIO >>>> passthrough in qemu): >>>> kill the qemu thread, then echo 0 > sriov_numvfs to disable sriov >>>> immediately, sometimes we will see following warnings: >>> I suspect this is fixed in vfio-next now, in a different way. Please check >> Can you point out which patches fix it? >> I need to merge back those patches to our version, then have a test. >> > commit ca5f21b2574903a7430fcb3590e534d92b2fa816 > Author: Jason Gunthorpe <jgg@nvidia.com> > Date: Thu Sep 22 21:06:10 2022 -0300 > > vfio: Follow a strict lifetime for struct iommu_group I merge back the patch and have a test, and it solves the issue. Thanks!
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 7cb56c3..890fdf9 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -73,6 +73,7 @@ struct vfio_group { struct mutex device_lock; struct list_head vfio_next; struct list_head container_next; + wait_queue_head_t container_q; enum vfio_group_type type; unsigned int dev_counter; struct rw_semaphore group_rwsem; @@ -367,6 +368,7 @@ static struct vfio_group *vfio_group_alloc(struct iommu_group *iommu_group, init_rwsem(&group->group_rwsem); INIT_LIST_HEAD(&group->device_list); mutex_init(&group->device_lock); + init_waitqueue_head(&group->container_q); group->iommu_group = iommu_group; /* put in vfio_group_release() */ iommu_group_ref_get(iommu_group); @@ -699,6 +701,9 @@ void vfio_unregister_group_dev(struct vfio_device *device) group->dev_counter--; mutex_unlock(&group->device_lock); + if (list_empty(&group->device_list)) + wait_event(group->container_q, !group->container); + if (group->type == VFIO_NO_IOMMU || group->type == VFIO_EMULATED_IOMMU) iommu_group_remove_device(device->dev); @@ -946,6 +951,7 @@ static void __vfio_group_unset_container(struct vfio_group *group) iommu_group_release_dma_owner(group->iommu_group); group->container = NULL; + wake_up(&group->container_q); group->container_users = 0; list_del(&group->container_next);