Message ID | 20180831212639.10196-17-keith.busch@intel.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI, error handling and hot plug | expand |
On Fri, Aug 31, 2018 at 03:26:39PM -0600, Keith Busch wrote: > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -294,21 +294,20 @@ struct pci_sriov { > static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused) > { > - set_bit(PCI_DEV_DISCONNECTED, &dev->priv_flags); > + dev->error_state = pci_channel_io_perm_failure; > return 0; > } Back in 2016 when I floated the idea of using error_state to store that the device has been removed, you responded: "I'd be happy if we can reuse that, but concerned about overloading error_state's intended purpose for AER. The conditions under which an 'is_removed' may be set can also create AER events, and the aer driver overrides the error_state." https://spinics.net/lists/linux-pci/msg55417.html Is it guaranteed that AER refrains from writing a different value to error_state once it has been set to pci_channel_io_perm_failure due to removal? If so I'm happy with this patch. Thanks, Lukas
On Sun, 2018-09-02 at 16:39 +0200, Lukas Wunner wrote: > On Fri, Aug 31, 2018 at 03:26:39PM -0600, Keith Busch wrote: > > --- a/drivers/pci/pci.h > > +++ b/drivers/pci/pci.h > > @@ -294,21 +294,20 @@ struct pci_sriov { > > static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused) > > { > > - set_bit(PCI_DEV_DISCONNECTED, &dev->priv_flags); > > + dev->error_state = pci_channel_io_perm_failure; > > return 0; > > } > > Back in 2016 when I floated the idea of using error_state to store > that the device has been removed, you responded: Wow, lots of activity while I wasn't looking :-) Unfortunately I'll be away for a few weeks... A quick note: > "I'd be happy if we can reuse that, but concerned about overloading > error_state's intended purpose for AER. The conditions under which an > 'is_removed' may be set can also create AER events, and the aer driver > overrides the error_state." > https://spinics.net/lists/linux-pci/msg55417.html > > Is it guaranteed that AER refrains from writing a different value to > error_state once it has been set to pci_channel_io_perm_failure due > to removal? If so I'm happy with this patch. My suggestion to avoid that problem (we have a similar one in theory with EEH which can set error_state from interrupts) is to make error_state an atomic by having the "set" function use cmpxchg to enforce that there is no valid transition from perm_failure. I was hoping to cookup some patches along the line of the RFC I already sent factoring the above, but a number of things here got in the way and I'm about to head out of the country for 3 weeks. Cheers, Ben.
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index ed2965ee6779..20cadf91cd9c 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -294,21 +294,20 @@ struct pci_sriov { bool drivers_autoprobe; /* Auto probing of VFs by driver */ }; -/* pci_dev priv_flags */ -#define PCI_DEV_DISCONNECTED 0 -#define PCI_DEV_ADDED 1 - static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused) { - set_bit(PCI_DEV_DISCONNECTED, &dev->priv_flags); + dev->error_state = pci_channel_io_perm_failure; return 0; } static inline bool pci_dev_is_disconnected(const struct pci_dev *dev) { - return test_bit(PCI_DEV_DISCONNECTED, &dev->priv_flags); + return dev->error_state == pci_channel_io_perm_failure; } +/* pci_dev priv_flags */ +#define PCI_DEV_ADDED 0 + static inline void pci_dev_assign_added(struct pci_dev *dev, bool added) { assign_bit(PCI_DEV_ADDED, &dev->priv_flags, added);
This patch brings surprise removals and permanent failures together so no longer need separate flags. Signed-off-by: Keith Busch <keith.busch@intel.com> --- drivers/pci/pci.h | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)