Message ID | 20230404020545.32359-7-decui@microsoft.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | pci-hyper: Fix race condition bugs for fast device hotplug | expand |
From: Dexuan Cui <decui@microsoft.com> Sent: Monday, April 3, 2023 7:06 PM > > Commit 414428c5da1c ("PCI: hv: Lock PCI bus on device eject") added > pci_lock_rescan_remove() and pci_unlock_rescan_remove() in > create_root_hv_pci_bus() and in hv_eject_device_work() to address the > race between create_root_hv_pci_bus() and hv_eject_device_work(), but it > turns that grubing the pci_rescan_remove_lock mutex is not enough: > refer to the earlier fix "PCI: hv: Add a per-bus mutex state_lock". > > Now with hbus->state_lock and other fixes, the race is resolved, so > remove pci_{lock,unlock}_rescan_remove() in create_root_hv_pci_bus(): > this removes the serialization in hv_pci_probe() and hence allows > async-probing (PROBE_PREFER_ASYNCHRONOUS) to work. > > Add the async-probing flag to hv_pci_drv. > > pci_{lock,unlock}_rescan_remove() in hv_eject_device_work() and in > hv_pci_remove() are still kept: according to the comment before > drivers/pci/probe.c: static DEFINE_MUTEX(pci_rescan_remove_lock), > "PCI device removal routines should always be executed under this mutex". > > Signed-off-by: Dexuan Cui <decui@microsoft.com> > Cc: stable@vger.kernel.org > --- > > v2: > No change to the patch body. > Improved the commit message [Michael Kelley] > Added Cc:stable > > drivers/pci/controller/pci-hyperv.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c > index 3ae2f99dea8c2..2ea2b1b8a4c9a 100644 > --- a/drivers/pci/controller/pci-hyperv.c > +++ b/drivers/pci/controller/pci-hyperv.c > @@ -2312,12 +2312,16 @@ static int create_root_hv_pci_bus(struct > hv_pcibus_device *hbus) > if (error) > return error; > > - pci_lock_rescan_remove(); > + /* > + * pci_lock_rescan_remove() and pci_unlock_rescan_remove() are > + * unnecessary here, because we hold the hbus->state_lock, meaning > + * hv_eject_device_work() and pci_devices_present_work() can't race > + * with create_root_hv_pci_bus(). > + */ > hv_pci_assign_numa_node(hbus); > pci_bus_assign_resources(bridge->bus); > hv_pci_assign_slots(hbus); > pci_bus_add_devices(bridge->bus); > - pci_unlock_rescan_remove(); > hbus->state = hv_pcibus_installed; > return 0; > } > @@ -4003,6 +4007,9 @@ static struct hv_driver hv_pci_drv = { > .remove = hv_pci_remove, > .suspend = hv_pci_suspend, > .resume = hv_pci_resume, > + .driver = { > + .probe_type = PROBE_PREFER_ASYNCHRONOUS, > + }, > }; > > static void __exit exit_hv_pci_drv(void) > -- > 2.25.1 Reviewed-by: Michael Kelley <mikelley@microsoft.com>
From: Michael Kelley (LINUX) <mikelley@microsoft.com> Sent: Friday, April 7, 2023 9:12 AM > > From: Dexuan Cui <decui@microsoft.com> Sent: Monday, April 3, 2023 7:06 PM > > > > Commit 414428c5da1c ("PCI: hv: Lock PCI bus on device eject") added > > pci_lock_rescan_remove() and pci_unlock_rescan_remove() in > > create_root_hv_pci_bus() and in hv_eject_device_work() to address the > > race between create_root_hv_pci_bus() and hv_eject_device_work(), but it > > turns that grubing the pci_rescan_remove_lock mutex is not enough: There's some kind of spelling error or typo above. Should "grubing" be "grabbing"? Or did you intend something else? Michael > > refer to the earlier fix "PCI: hv: Add a per-bus mutex state_lock". > > > > Now with hbus->state_lock and other fixes, the race is resolved, so > > remove pci_{lock,unlock}_rescan_remove() in create_root_hv_pci_bus(): > > this removes the serialization in hv_pci_probe() and hence allows > > async-probing (PROBE_PREFER_ASYNCHRONOUS) to work. > > > > Add the async-probing flag to hv_pci_drv. > > > > pci_{lock,unlock}_rescan_remove() in hv_eject_device_work() and in > > hv_pci_remove() are still kept: according to the comment before > > drivers/pci/probe.c: static DEFINE_MUTEX(pci_rescan_remove_lock), > > "PCI device removal routines should always be executed under this mutex". > > > > Signed-off-by: Dexuan Cui <decui@microsoft.com> > > Cc: stable@vger.kernel.org
> From: Michael Kelley (LINUX) <mikelley@microsoft.com> > Sent: Friday, April 7, 2023 9:15 AM > ... > > > Commit 414428c5da1c ("PCI: hv: Lock PCI bus on device eject") added > > > pci_lock_rescan_remove() and pci_unlock_rescan_remove() in > > > create_root_hv_pci_bus() and in hv_eject_device_work() to address the > > > race between create_root_hv_pci_bus() and hv_eject_device_work(), but > > > it turns that grubing the pci_rescan_remove_lock mutex is not enough: > > There's some kind of spelling error or typo above. Should "grubing" be > "grabbing"? Or did you intend something else? > > Michael Sorry, it's a typo. The "grubing" should be "grabbing". I suppose the PCI maintainers can help fix this. Let me know if v3 is needed.
> Subject: RE: [PATCH v2 6/6] PCI: hv: Use async probing to reduce boot time > > > From: Michael Kelley (LINUX) <mikelley@microsoft.com> > > Sent: Friday, April 7, 2023 9:15 AM > > ... > > > > Commit 414428c5da1c ("PCI: hv: Lock PCI bus on device eject") > > > > added > > > > pci_lock_rescan_remove() and pci_unlock_rescan_remove() in > > > > create_root_hv_pci_bus() and in hv_eject_device_work() to address > > > > the race between create_root_hv_pci_bus() and > > > > hv_eject_device_work(), but it turns that grubing the > pci_rescan_remove_lock mutex is not enough: > > > > There's some kind of spelling error or typo above. Should "grubing" > > be "grabbing"? Or did you intend something else? > > > > Michael > > Sorry, it's a typo. The "grubing" should be "grabbing". > I suppose the PCI maintainers can help fix this. Let me know if v3 is needed. Other than the typo, Reviewed-by: Long Li <longli@microsoft.com>
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c index 3ae2f99dea8c2..2ea2b1b8a4c9a 100644 --- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -2312,12 +2312,16 @@ static int create_root_hv_pci_bus(struct hv_pcibus_device *hbus) if (error) return error; - pci_lock_rescan_remove(); + /* + * pci_lock_rescan_remove() and pci_unlock_rescan_remove() are + * unnecessary here, because we hold the hbus->state_lock, meaning + * hv_eject_device_work() and pci_devices_present_work() can't race + * with create_root_hv_pci_bus(). + */ hv_pci_assign_numa_node(hbus); pci_bus_assign_resources(bridge->bus); hv_pci_assign_slots(hbus); pci_bus_add_devices(bridge->bus); - pci_unlock_rescan_remove(); hbus->state = hv_pcibus_installed; return 0; } @@ -4003,6 +4007,9 @@ static struct hv_driver hv_pci_drv = { .remove = hv_pci_remove, .suspend = hv_pci_suspend, .resume = hv_pci_resume, + .driver = { + .probe_type = PROBE_PREFER_ASYNCHRONOUS, + }, }; static void __exit exit_hv_pci_drv(void)
Commit 414428c5da1c ("PCI: hv: Lock PCI bus on device eject") added pci_lock_rescan_remove() and pci_unlock_rescan_remove() in create_root_hv_pci_bus() and in hv_eject_device_work() to address the race between create_root_hv_pci_bus() and hv_eject_device_work(), but it turns that grubing the pci_rescan_remove_lock mutex is not enough: refer to the earlier fix "PCI: hv: Add a per-bus mutex state_lock". Now with hbus->state_lock and other fixes, the race is resolved, so remove pci_{lock,unlock}_rescan_remove() in create_root_hv_pci_bus(): this removes the serialization in hv_pci_probe() and hence allows async-probing (PROBE_PREFER_ASYNCHRONOUS) to work. Add the async-probing flag to hv_pci_drv. pci_{lock,unlock}_rescan_remove() in hv_eject_device_work() and in hv_pci_remove() are still kept: according to the comment before drivers/pci/probe.c: static DEFINE_MUTEX(pci_rescan_remove_lock), "PCI device removal routines should always be executed under this mutex". Signed-off-by: Dexuan Cui <decui@microsoft.com> Cc: stable@vger.kernel.org --- v2: No change to the patch body. Improved the commit message [Michael Kelley] Added Cc:stable drivers/pci/controller/pci-hyperv.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)