diff mbox series

[3/3] PCI/ERR: Retain status from error notification

Message ID 20201217171431.502030-3-kbusch@kernel.org (mailing list archive)
State Superseded
Delegated to: Bjorn Helgaas
Headers show
Series [1/3] PCI/ERR: Clear status of the reporting device | expand

Commit Message

Keith Busch Dec. 17, 2020, 5:14 p.m. UTC
Overwriting the frozen detected status with the result of the link reset
loses the NEED_RESET result that drivers are depending on for error
handling to report the .slot_reset() callback. Retain this status so
that subsequent error handling has the correct flow.

Reported-by: Hinko Kocevar <hinko.kocevar@ess.eu>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/pci/pcie/err.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Kelley, Sean V Jan. 4, 2021, 6:43 p.m. UTC | #1
Hi Keith,

> On Dec 17, 2020, at 9:14 AM, Keith Busch <kbusch@kernel.org> wrote:
> 
> Overwriting the frozen detected status with the result of the link reset
> loses the NEED_RESET result that drivers are depending on for error
> handling to report the .slot_reset() callback. Retain this status so
> that subsequent error handling has the correct flow.
> 
> Reported-by: Hinko Kocevar <hinko.kocevar@ess.eu>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
> drivers/pci/pcie/err.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index a84f0bf4c1e2..b576aa890c76 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -198,8 +198,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> 	pci_dbg(bridge, "broadcast error_detected message\n");
> 	if (state == pci_channel_io_frozen) {
> 		pci_walk_bridge(bridge, report_frozen_detected, &status);
> -		status = reset_subordinates(bridge);
> -		if (status != PCI_ERS_RESULT_RECOVERED) {
> +		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
> 			pci_warn(bridge, "subordinate device reset failed\n");
> 			goto failed;
> 		}

This was an interesting scenario for Hinko not getting the NEED_RESET.

Reviewed-by: Sean V Kelley <sean.v.kelley@intel.com>

> -- 
> 2.24.1
>
Hinko Kocevar Jan. 5, 2021, 2:12 p.m. UTC | #2
On 12/17/20 6:14 PM, Keith Busch wrote:
> Overwriting the frozen detected status with the result of the link reset
> loses the NEED_RESET result that drivers are depending on for error
> handling to report the .slot_reset() callback. Retain this status so
> that subsequent error handling has the correct flow.
> 
> Reported-by: Hinko Kocevar <hinko.kocevar@ess.eu>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>   drivers/pci/pcie/err.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index a84f0bf4c1e2..b576aa890c76 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -198,8 +198,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>   	pci_dbg(bridge, "broadcast error_detected message\n");
>   	if (state == pci_channel_io_frozen) {
>   		pci_walk_bridge(bridge, report_frozen_detected, &status);
> -		status = reset_subordinates(bridge);
> -		if (status != PCI_ERS_RESULT_RECOVERED) {
> +		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
>   			pci_warn(bridge, "subordinate device reset failed\n");
>   			goto failed;
>   		}
> 

Dear Keith,

I can confirm that with this series of patches, applied to a linux-pci 
GIT tree kernel, the problem is solved for me. The reset is carried out, 
the secondary bus and the PCI device beind the bus all recover after 
injecting an error into the root port.

Tested-by: Hinko Kocevar <hinko.kocevar@ess.eu>

Thank you!
diff mbox series

Patch

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index a84f0bf4c1e2..b576aa890c76 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -198,8 +198,7 @@  pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 	pci_dbg(bridge, "broadcast error_detected message\n");
 	if (state == pci_channel_io_frozen) {
 		pci_walk_bridge(bridge, report_frozen_detected, &status);
-		status = reset_subordinates(bridge);
-		if (status != PCI_ERS_RESULT_RECOVERED) {
+		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
 			pci_warn(bridge, "subordinate device reset failed\n");
 			goto failed;
 		}