diff mbox series

[RESEND,v3,1/1] PCI/ERR: don't clobber status after reset_link()

Message ID 20201010221653.2782993-2-hedi.berriche@hpe.com
State Superseded
Headers show
Series PCI/ERR: fix regression introduced by 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") | expand

Commit Message

Hedi Berriche Oct. 10, 2020, 10:16 p.m. UTC
Commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()")
broke pcie_do_recovery(): updating status after reset_link() has the ill
side effect of causing recovery to fail if the error status is
PCI_ERS_RESULT_CAN_RECOVER or PCI_ERS_RESULT_NEED_RESET as the following
code will *never* run in the case of a successful reset_link()

   177         if (status == PCI_ERS_RESULT_CAN_RECOVER) {
   ...
   181         }

   183         if (status == PCI_ERS_RESULT_NEED_RESET) {
   ...
   192         }

For instance in the case of PCI_ERS_RESULT_NEED_RESET we end up not
calling ->slot_reset() (because we skip report_slot_reset()) thus
breaking driver (re)initialisation.

Don't clobber status with the return value of reset_link(); set status
to PCI_ERS_RESULT_RECOVERED, in case of successful link reset, if and
only if the initial value of error status is PCI_ERS_RESULT_DISCONNECT
or PCI_ERS_RESULT_NO_AER_DRIVER.

Fixes: 6d2c89441571 ("PCI/ERR: Update error status after reset_link()")
Signed-off-by: Hedi Berriche <hedi.berriche@hpe.com>
Cc: Russ Anderson <rja@hpe.com>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Joerg Roedel <jroedel@suse.com>

Cc: stable@kernel.org # v5.7+
---
 drivers/pci/pcie/err.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Sinan Kaya Oct. 11, 2020, 5:56 p.m. UTC | #1
On 10/10/2020 6:16 PM, Hedi Berriche wrote:
> Commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()")
> broke pcie_do_recovery(): updating status after reset_link() has the ill
> side effect of causing recovery to fail if the error status is
> PCI_ERS_RESULT_CAN_RECOVER or PCI_ERS_RESULT_NEED_RESET as the following
> code will *never* run in the case of a successful reset_link()
> 
>    177         if (status == PCI_ERS_RESULT_CAN_RECOVER) {
>    ...
>    181         }
> 
>    183         if (status == PCI_ERS_RESULT_NEED_RESET) {
>    ...
>    192         }
> 
> For instance in the case of PCI_ERS_RESULT_NEED_RESET we end up not
> calling ->slot_reset() (because we skip report_slot_reset()) thus
> breaking driver (re)initialisation.
> 
> Don't clobber status with the return value of reset_link(); set status
> to PCI_ERS_RESULT_RECOVERED, in case of successful link reset, if and
> only if the initial value of error status is PCI_ERS_RESULT_DISCONNECT
> or PCI_ERS_RESULT_NO_AER_DRIVER.
> 
> Fixes: 6d2c89441571 ("PCI/ERR: Update error status after reset_link()")
> Signed-off-by: Hedi Berriche <hedi.berriche@hpe.com>
> Cc: Russ Anderson <rja@hpe.com>
> Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Ashok Raj <ashok.raj@intel.com>
> Cc: Joerg Roedel <jroedel@suse.com>
> 
> Cc: stable@kernel.org # v5.7+
> ---
>  drivers/pci/pcie/err.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index c543f419d8f9..2730826cfd8a 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -165,10 +165,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>  	pci_dbg(dev, "broadcast error_detected message\n");
>  	if (state == pci_channel_io_frozen) {
>  		pci_walk_bus(bus, report_frozen_detected, &status);
> -		status = reset_link(dev);
> -		if (status != PCI_ERS_RESULT_RECOVERED) {
> +		if (reset_link(dev) != PCI_ERS_RESULT_RECOVERED) {
>  			pci_warn(dev, "link reset failed\n");
>  			goto failed;
> +		} else {
> +			if (status == PCI_ERS_RESULT_DISCONNECT ||
> +			    status == PCI_ERS_RESULT_NO_AER_DRIVER)
> +				status = PCI_ERS_RESULT_RECOVERED;
>  		}
>  	} else {
>  		pci_walk_bus(bus, report_normal_detected, &status);
> 

Reviewed-by: Sinan Kaya <okaya@kernel.org>
Hedi Berriche Oct. 15, 2020, 3:32 p.m. UTC | #2
On Sun, Oct 11, 2020 at 18:56 Sinan Kaya wrote:
>On 10/10/2020 6:16 PM, Hedi Berriche wrote:
>> Commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()")
>> broke pcie_do_recovery(): updating status after reset_link() has the ill
>> side effect of causing recovery to fail if the error status is
>> PCI_ERS_RESULT_CAN_RECOVER or PCI_ERS_RESULT_NEED_RESET as the following
>> code will *never* run in the case of a successful reset_link()
>>
>>    177         if (status == PCI_ERS_RESULT_CAN_RECOVER) {
>>    ...
>>    181         }
>>
>>    183         if (status == PCI_ERS_RESULT_NEED_RESET) {
>>    ...
>>    192         }
>>
>> For instance in the case of PCI_ERS_RESULT_NEED_RESET we end up not
>> calling ->slot_reset() (because we skip report_slot_reset()) thus
>> breaking driver (re)initialisation.
>>
>> Don't clobber status with the return value of reset_link(); set status
>> to PCI_ERS_RESULT_RECOVERED, in case of successful link reset, if and
>> only if the initial value of error status is PCI_ERS_RESULT_DISCONNECT
>> or PCI_ERS_RESULT_NO_AER_DRIVER.
>>
>> Fixes: 6d2c89441571 ("PCI/ERR: Update error status after reset_link()")
>> Signed-off-by: Hedi Berriche <hedi.berriche@hpe.com>
>> Cc: Russ Anderson <rja@hpe.com>
>> Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>> Cc: Bjorn Helgaas <bhelgaas@google.com>
>> Cc: Ashok Raj <ashok.raj@intel.com>
>> Cc: Joerg Roedel <jroedel@suse.com>
>>
>> Cc: stable@kernel.org # v5.7+
>> ---
>>  drivers/pci/pcie/err.c | 7 +++++--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>> index c543f419d8f9..2730826cfd8a 100644
>> --- a/drivers/pci/pcie/err.c
>> +++ b/drivers/pci/pcie/err.c
>> @@ -165,10 +165,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>>  	pci_dbg(dev, "broadcast error_detected message\n");
>>  	if (state == pci_channel_io_frozen) {
>>  		pci_walk_bus(bus, report_frozen_detected, &status);
>> -		status = reset_link(dev);
>> -		if (status != PCI_ERS_RESULT_RECOVERED) {
>> +		if (reset_link(dev) != PCI_ERS_RESULT_RECOVERED) {
>>  			pci_warn(dev, "link reset failed\n");
>>  			goto failed;
>> +		} else {
>> +			if (status == PCI_ERS_RESULT_DISCONNECT ||
>> +			    status == PCI_ERS_RESULT_NO_AER_DRIVER)
>> +				status = PCI_ERS_RESULT_RECOVERED;
>>  		}
>>  	} else {
>>  		pci_walk_bus(bus, report_normal_detected, &status);
>>
>
>Reviewed-by: Sinan Kaya <okaya@kernel.org>

Thanks for the Reviewed-by, Sinan.

Folks,

Any further comments or is this ripe for acceptance?

Cheers,
Hedi.
diff mbox series

Patch

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index c543f419d8f9..2730826cfd8a 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -165,10 +165,13 @@  pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 	pci_dbg(dev, "broadcast error_detected message\n");
 	if (state == pci_channel_io_frozen) {
 		pci_walk_bus(bus, report_frozen_detected, &status);
-		status = reset_link(dev);
-		if (status != PCI_ERS_RESULT_RECOVERED) {
+		if (reset_link(dev) != PCI_ERS_RESULT_RECOVERED) {
 			pci_warn(dev, "link reset failed\n");
 			goto failed;
+		} else {
+			if (status == PCI_ERS_RESULT_DISCONNECT ||
+			    status == PCI_ERS_RESULT_NO_AER_DRIVER)
+				status = PCI_ERS_RESULT_RECOVERED;
 		}
 	} else {
 		pci_walk_bus(bus, report_normal_detected, &status);