Message ID | 20180817044902.31420-3-benh@kernel.crashing.org (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | pci: Rework is_added race fix and address bridge enable races | expand |
On Fri, Aug 17, 2018 at 02:48:58PM +1000, Benjamin Herrenschmidt wrote: > This re-fixes the bug reported by Hari Vyas <hari.vyas@broadcom.com> > after my revert of his commit but in a much simpler way. > > The main issues is that is_added was being set after the driver > got bound and started, and thus setting it could race with other > changes to struct pci_dev. The "bind driver, then set dev->added = 1" order seems to have been there since the beginning of dev->is_added: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8a1bc9013a03 This patch seems reasonable, but I'm a little dubious about the existence of "is_added" in the first place. As far as I can tell, the only other buses with something similar are the MEN Chameleon bus and the Intel Management Engine Interface. The PCI uses of "is_added" don't seem *that* critical or unique to PCI, so I'm not 100% convinced we need it at all. But I haven't looked into it enough to be able to propose an alternative. > This fixes it by setting the flag first, which also has the > advantage of matching the fact that we are clearing it *after* > unbinding in the remove path, thus the flag is now symtetric > and always set while the driver code is running. > > Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> > --- > drivers/pci/bus.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c > index 35b7fc87eac5..48ae63673aa8 100644 > --- a/drivers/pci/bus.c > +++ b/drivers/pci/bus.c > @@ -321,16 +321,16 @@ void pci_bus_add_device(struct pci_dev *dev) > pci_proc_attach_device(dev); > pci_bridge_d3_update(dev); > > + dev->is_added = 1; > dev->match_driver = true; > retval = device_attach(&dev->dev); > if (retval < 0 && retval != -EPROBE_DEFER) { > + dev->is_added = 0; > pci_warn(dev, "device attach failed (%d)\n", retval); > pci_proc_detach_device(dev); > pci_remove_sysfs_dev_files(dev); > return; > } > - > - dev->is_added = 1; > } > EXPORT_SYMBOL_GPL(pci_bus_add_device); > > -- > 2.17.1 >
On Fri, Aug 17, 2018 at 11:25:34AM -0500, Bjorn Helgaas wrote: > The "bind driver, then set dev->added = 1" order seems to have been > there since the beginning of dev->is_added: > > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8a1bc9013a03 > > This patch seems reasonable, but I'm a little dubious about the > existence of "is_added" in the first place. As far as I can tell, the > only other buses with something similar are the MEN Chameleon bus and > the Intel Management Engine Interface. > > The PCI uses of "is_added" don't seem *that* critical or unique to > PCI, so I'm not 100% convinced we need it at all. But I haven't > looked into it enough to be able to propose an alternative. Addition of new PCI devices, e.g. on boot or on hotplug, consists of two stages: First, new devices are created by pci_scan_slot(), afterwards they're bound to a driver by pci_bus_add_devices(). The idea is that the bus is scanned in full before drivers are attached to devices. In the second step, i.e. in pci_bus_add_devices(), the is_added flag is set on devices. The flag is significant because pci_scan_slot() returns the number of newly discovered PCI functions in the given slot, and it calculates that number based on the is_added flag. More specifically, it invokes pci_scan_single_device() which either returns an existing device or a newly created device. The is_added flag is basically a way for pci_scan_single_device() to communicate back to pci_scan_slot() which of the two code paths it took. The two steps (enumeration and driver attachment) are protected by pci_lock_rescan_remove(). Initially, when a pci_dev is newly allocated with kzalloc(), is_added is 0. The transition from 0 to 1 happens while under pci_lock_rescan_remove(). When that lock is released, is_added is always 1, barring a device_attach() error, in which case it will remain at 0. AFAICS, there is no second mutex needed to ensure synchronization of is_added, the existing mutex should be sufficient and the only problem are RMW races when accessing adjacent flags. Those were correctly addressed by Hari's patch. Benjamin seems to be alleging that concurrency issues exist beyond the RMW races, I fail to see them, sorry. Thanks, Lukas
On Fri, 2018-08-17 at 11:25 -0500, Bjorn Helgaas wrote: > On Fri, Aug 17, 2018 at 02:48:58PM +1000, Benjamin Herrenschmidt wrote: > > This re-fixes the bug reported by Hari Vyas <hari.vyas@broadcom.com> > > after my revert of his commit but in a much simpler way. > > > > The main issues is that is_added was being set after the driver > > got bound and started, and thus setting it could race with other > > changes to struct pci_dev. > > The "bind driver, then set dev->added = 1" order seems to have been > there since the beginning of dev->is_added: > > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8a1bc9013a03 > > This patch seems reasonable, but I'm a little dubious about the > existence of "is_added" in the first place. As far as I can tell, the > only other buses with something similar are the MEN Chameleon bus and > the Intel Management Engine Interface. > > The PCI uses of "is_added" don't seem *that* critical or unique to > PCI, so I'm not 100% convinced we need it at all. But I haven't > looked into it enough to be able to propose an alternative. This is a whole different conversation you are taking us into :-) is_added is currently needed for a number of reasons, mostly relating to partial hotplug, and historically comes from the fact that we separated the PCI probing & tree construction from the registration with the device-model. This of course comes from the fact that the device model didn't actually exist yet when the PCI code was created :-) So let's keep things separate shall we ? I'd rather fix this correctly now, and get rid of that pesky atomic priv_flags which I think is just going to be a long term add to the mess rather than an improvement, and separately we can discuss whether is_added is something that can go away, but I suspect this will come in the form of either a deeper rework of how we do PCI probing, or simply finding a struct device/kobj field we can use as a hint that we've added the device already for hotplug. > > This fixes it by setting the flag first, which also has the > > advantage of matching the fact that we are clearing it *after* > > unbinding in the remove path, thus the flag is now symtetric > > and always set while the driver code is running. > > > > Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> > > --- > > drivers/pci/bus.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c > > index 35b7fc87eac5..48ae63673aa8 100644 > > --- a/drivers/pci/bus.c > > +++ b/drivers/pci/bus.c > > @@ -321,16 +321,16 @@ void pci_bus_add_device(struct pci_dev *dev) > > pci_proc_attach_device(dev); > > pci_bridge_d3_update(dev); > > > > + dev->is_added = 1; > > dev->match_driver = true; > > retval = device_attach(&dev->dev); > > if (retval < 0 && retval != -EPROBE_DEFER) { > > + dev->is_added = 0; > > pci_warn(dev, "device attach failed (%d)\n", retval); > > pci_proc_detach_device(dev); > > pci_remove_sysfs_dev_files(dev); > > return; > > } > > - > > - dev->is_added = 1; > > } > > EXPORT_SYMBOL_GPL(pci_bus_add_device); > > > > -- > > 2.17.1 > >
On Fri, 2018-08-17 at 20:15 +0200, Lukas Wunner wrote: > > The two steps (enumeration and driver attachment) are protected by > pci_lock_rescan_remove(). Initially, when a pci_dev is newly allocated > with kzalloc(), is_added is 0. The transition from 0 to 1 happens while > under pci_lock_rescan_remove(). When that lock is released, is_added is > always 1, barring a device_attach() error, in which case it will remain > at 0. > > AFAICS, there is no second mutex needed to ensure synchronization of > is_added, the existing mutex should be sufficient and the only problem > are RMW races when accessing adjacent flags. Those were correctly addressed > by Hari's patch. Benjamin seems to be alleging that concurrency issues > exist beyond the RMW races, I fail to see them, sorry. Im saying that fixing those issues using atomic bitops is a generally unsafe practice even if it happens to work in a specific case. In this one, I argue that the root problem was simply that is_added was set in the wrong place. The "fix" from Hari touches 9 files, adds horrible relative includes to an architecture and generally bloats things while a single 3 liner would have been sufficient. The current use of is_added is asymetric (it's cleared during device_attach but set during detach) which is bogus and the entire race stems from the fact that it is set after device_attach returns. So setting it early is, imho, the right fix, is much simpler, and fixes the odd imbalance we have to begin with. Ben.
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c index 35b7fc87eac5..48ae63673aa8 100644 --- a/drivers/pci/bus.c +++ b/drivers/pci/bus.c @@ -321,16 +321,16 @@ void pci_bus_add_device(struct pci_dev *dev) pci_proc_attach_device(dev); pci_bridge_d3_update(dev); + dev->is_added = 1; dev->match_driver = true; retval = device_attach(&dev->dev); if (retval < 0 && retval != -EPROBE_DEFER) { + dev->is_added = 0; pci_warn(dev, "device attach failed (%d)\n", retval); pci_proc_detach_device(dev); pci_remove_sysfs_dev_files(dev); return; } - - dev->is_added = 1; } EXPORT_SYMBOL_GPL(pci_bus_add_device);
This re-fixes the bug reported by Hari Vyas <hari.vyas@broadcom.com> after my revert of his commit but in a much simpler way. The main issues is that is_added was being set after the driver got bound and started, and thus setting it could race with other changes to struct pci_dev. This fixes it by setting the flag first, which also has the advantage of matching the fact that we are clearing it *after* unbinding in the remove path, thus the flag is now symtetric and always set while the driver code is running. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> --- drivers/pci/bus.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)