Message ID | 20201217171431.502030-3-kbusch@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | [1/3] PCI/ERR: Clear status of the reporting device | expand |
Hi Keith, > On Dec 17, 2020, at 9:14 AM, Keith Busch <kbusch@kernel.org> wrote: > > Overwriting the frozen detected status with the result of the link reset > loses the NEED_RESET result that drivers are depending on for error > handling to report the .slot_reset() callback. Retain this status so > that subsequent error handling has the correct flow. > > Reported-by: Hinko Kocevar <hinko.kocevar@ess.eu> > Signed-off-by: Keith Busch <kbusch@kernel.org> > --- > drivers/pci/pcie/err.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > index a84f0bf4c1e2..b576aa890c76 100644 > --- a/drivers/pci/pcie/err.c > +++ b/drivers/pci/pcie/err.c > @@ -198,8 +198,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > pci_dbg(bridge, "broadcast error_detected message\n"); > if (state == pci_channel_io_frozen) { > pci_walk_bridge(bridge, report_frozen_detected, &status); > - status = reset_subordinates(bridge); > - if (status != PCI_ERS_RESULT_RECOVERED) { > + if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) { > pci_warn(bridge, "subordinate device reset failed\n"); > goto failed; > } This was an interesting scenario for Hinko not getting the NEED_RESET. Reviewed-by: Sean V Kelley <sean.v.kelley@intel.com> > -- > 2.24.1 >
On 12/17/20 6:14 PM, Keith Busch wrote: > Overwriting the frozen detected status with the result of the link reset > loses the NEED_RESET result that drivers are depending on for error > handling to report the .slot_reset() callback. Retain this status so > that subsequent error handling has the correct flow. > > Reported-by: Hinko Kocevar <hinko.kocevar@ess.eu> > Signed-off-by: Keith Busch <kbusch@kernel.org> > --- > drivers/pci/pcie/err.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > index a84f0bf4c1e2..b576aa890c76 100644 > --- a/drivers/pci/pcie/err.c > +++ b/drivers/pci/pcie/err.c > @@ -198,8 +198,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > pci_dbg(bridge, "broadcast error_detected message\n"); > if (state == pci_channel_io_frozen) { > pci_walk_bridge(bridge, report_frozen_detected, &status); > - status = reset_subordinates(bridge); > - if (status != PCI_ERS_RESULT_RECOVERED) { > + if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) { > pci_warn(bridge, "subordinate device reset failed\n"); > goto failed; > } > Dear Keith, I can confirm that with this series of patches, applied to a linux-pci GIT tree kernel, the problem is solved for me. The reset is carried out, the secondary bus and the PCI device beind the bus all recover after injecting an error into the root port. Tested-by: Hinko Kocevar <hinko.kocevar@ess.eu> Thank you!
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index a84f0bf4c1e2..b576aa890c76 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -198,8 +198,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_dbg(bridge, "broadcast error_detected message\n"); if (state == pci_channel_io_frozen) { pci_walk_bridge(bridge, report_frozen_detected, &status); - status = reset_subordinates(bridge); - if (status != PCI_ERS_RESULT_RECOVERED) { + if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) { pci_warn(bridge, "subordinate device reset failed\n"); goto failed; }
Overwriting the frozen detected status with the result of the link reset loses the NEED_RESET result that drivers are depending on for error handling to report the .slot_reset() callback. Retain this status so that subsequent error handling has the correct flow. Reported-by: Hinko Kocevar <hinko.kocevar@ess.eu> Signed-off-by: Keith Busch <kbusch@kernel.org> --- drivers/pci/pcie/err.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)