Message ID | 170311143880.2826.17853753430536108145.stgit@bgt-140510-bm01.eng.stellus.in (mailing list archive) |
---|---|
Headers | show |
Series | pci/iov: avoid device_lock() when reading sriov_numvfs | expand |
[+cc Pierre, author of 35ff867b7657 ("PCI/IOV: Serialize sysfs sriov_numvfs reads vs writes")] On Wed, Dec 20, 2023 at 10:58:12PM +0000, Jim Harris wrote: > If SR-IOV enabled device is held by vfio, and device is removed, > vfio will hold device lock and notify userspace of the removal. If > userspace reads sriov_numvfs sysfs entry, that thread will be > blocked since sriov_numvfs_show() also tries to acquire the device > lock. If that same thread is responsible for releasing the device to > vfio, it results in a deadlock. > > One patch was proposed to add a separate mutex, specifically for > struct pci_sriov, to synchronize access to sriov_numvfs in the sysfs > paths (replacing use of the device_lock()). Leon instead suggested > just reverting the commit 35ff867b765 which introduced device_lock() > in the store path. This also led to a small fix around ordering on > the kobject_uevent() when sriov_numvfs is updated. > > Ref: https://lore.kernel.org/linux-pci/ZXJI5+f8bUelVXqu@ubuntu/ 1) Cc author of the commit being reverted (Pierre) so he has a chance to chime in and make sure the proposed fix works for him as well. 2) The revert commit log needs to justify the revert, not merely say what the proper way is. The Ref: above suggests that the current code (pre-revert) leads to a deadlock in some cases, so the revert commit log should detail that. It's ideal if we never regress, not even between the revert and the second patch, so it's possible that they should be squashed into a single patch. But if you keep it as two patches, it's trivial for me to squash them if we decide that's best. 3) Follow subject line convention for drivers/pci (use "git log --oneline drivers/pci" to learn it). I did 1) here and could do 3) for you, but it would be better if you could update and repost the series with 2) updated. In the meantime you may notice that I pushed these on a pci/virtualization just to get the 0-day bot to build test it. I propose to replace that branch with an updated series, since the code changes themselves probably will stay the same. > --- > > Jim Harris (2): > Revert "PCI/IOV: Serialize sysfs sriov_numvfs reads vs writes" > pci/iov: fix kobject_uevent() ordering in sriov_enable() > > > drivers/pci/iov.c | 10 ++-------- > 1 file changed, 2 insertions(+), 8 deletions(-) > > --
On Thu, Feb 08, 2024 at 06:30:02PM -0600, Bjorn Helgaas wrote: > [+cc Pierre, author of 35ff867b7657 ("PCI/IOV: Serialize sysfs > sriov_numvfs reads vs writes")] > > On Wed, Dec 20, 2023 at 10:58:12PM +0000, Jim Harris wrote: > > If SR-IOV enabled device is held by vfio, and device is removed, > > vfio will hold device lock and notify userspace of the removal. If > > userspace reads sriov_numvfs sysfs entry, that thread will be > > blocked since sriov_numvfs_show() also tries to acquire the device > > lock. If that same thread is responsible for releasing the device to > > vfio, it results in a deadlock. > > > > One patch was proposed to add a separate mutex, specifically for > > struct pci_sriov, to synchronize access to sriov_numvfs in the sysfs > > paths (replacing use of the device_lock()). Leon instead suggested > > just reverting the commit 35ff867b765 which introduced device_lock() > > in the store path. This also led to a small fix around ordering on > > the kobject_uevent() when sriov_numvfs is updated. > > > > Ref: https://lore.kernel.org/linux-pci/ZXJI5+f8bUelVXqu@ubuntu/ > > 1) Cc author of the commit being reverted (Pierre) so he has a chance > to chime in and make sure the proposed fix works for him as well. Ack. I'll also Cc Pierre on the v2. > 2) The revert commit log needs to justify the revert, not merely say > what the proper way is. The Ref: above suggests that the current code > (pre-revert) leads to a deadlock in some cases, so the revert commit > log should detail that. > > It's ideal if we never regress, not even between the revert and the > second patch, so it's possible that they should be squashed into a > single patch. But if you keep it as two patches, it's trivial for me > to squash them if we decide that's best. The deadlock I hit is fixed by patch 1 alone. Patch 2 is a separate bug - it's better to update the num_VFs value before sending the notification that the num_VFs value changed. I'll add some more color to that commit message too, to differentiate it from the revert. I have no issues if you eventually decide to squash them. > > 3) Follow subject line convention for drivers/pci (use "git log > --oneline drivers/pci" to learn it). Will fix in v2. Thanks, Jim