diff mbox

scsi: aacraid: fix PCI error recovery path

Message ID 20170406211209.27129-1-gpiccoli@linux.vnet.ibm.com (mailing list archive)
State Accepted, archived
Headers show

Commit Message

Guilherme G. Piccoli April 6, 2017, 9:12 p.m. UTC
During a PCI error recovery, if aac_check_health() is not aware that
a PCI error happened and we have an offline PCI channel, it might
trigger some errors (like NULL pointer dereference) and inhibit the
error recovery process to complete.

This patch makes the health check procedure aware of PCI channel
issues, and in case of error recovery process, the function
aac_adapter_check_health() returns -1 and let the recovery process
to complete successfully. This patch was tested on upstream kernel
v4.11-rc5 in PowerPC ppc64le architecture with adapter 9005:028d
(VID:DID) - the error recovery procedure was able to recover fine.

Fixes: 5c63f7f710bd ("aacraid: Added EEH support")
Cc: stable@vger.kernel.org # v4.6+
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
---
 drivers/scsi/aacraid/aacraid.h | 11 ++++++++---
 drivers/scsi/aacraid/commsup.c |  3 ++-
 2 files changed, 10 insertions(+), 4 deletions(-)

Comments

Dave Carroll April 6, 2017, 9:46 p.m. UTC | #1
> From: Guilherme G. Piccoli [mailto:gpiccoli@linux.vnet.ibm.com]
> Sent: Thursday, April 06, 2017 3:12 PM
> To: dl-esc-Aacraid Linux Driver <aacraid@microsemi.com>
> Cc: gpiccoli@linux.vnet.ibm.com; linux-scsi@vger.kernel.org; Raghava Aditya
> Renukunta <RaghavaAditya.Renukunta@microsemi.com>
> Subject: [PATCH] scsi: aacraid: fix PCI error recovery path
> 
> 
> During a PCI error recovery, if aac_check_health() is not aware that a PCI error
> happened and we have an offline PCI channel, it might trigger some errors (like
> NULL pointer dereference) and inhibit the error recovery process to complete.
> 
> This patch makes the health check procedure aware of PCI channel issues, and in
> case of error recovery process, the function
> aac_adapter_check_health() returns -1 and let the recovery process to complete
> successfully. This patch was tested on upstream kernel
> v4.11-rc5 in PowerPC ppc64le architecture with adapter 9005:028d
> (VID:DID) - the error recovery procedure was able to recover fine.
> 
> Fixes: 5c63f7f710bd ("aacraid: Added EEH support")
> Cc: stable@vger.kernel.org # v4.6+
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
> ---
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Martin K. Petersen April 12, 2017, 12:46 a.m. UTC | #2
"Guilherme G. Piccoli" <gpiccoli@linux.vnet.ibm.com> writes:

> During a PCI error recovery, if aac_check_health() is not aware that
> a PCI error happened and we have an offline PCI channel, it might
> trigger some errors (like NULL pointer dereference) and inhibit the
> error recovery process to complete.
>
> This patch makes the health check procedure aware of PCI channel
> issues, and in case of error recovery process, the function
> aac_adapter_check_health() returns -1 and let the recovery process
> to complete successfully. This patch was tested on upstream kernel
> v4.11-rc5 in PowerPC ppc64le architecture with adapter 9005:028d
> (VID:DID) - the error recovery procedure was able to recover fine.

Applied to 4.11/scsi-fixes, thanks!
diff mbox

Patch

diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
index d036a806f31c..d281492009fb 100644
--- a/drivers/scsi/aacraid/aacraid.h
+++ b/drivers/scsi/aacraid/aacraid.h
@@ -1690,9 +1690,6 @@  struct aac_dev
 #define aac_adapter_sync_cmd(dev, command, p1, p2, p3, p4, p5, p6, status, r1, r2, r3, r4) \
 	(dev)->a_ops.adapter_sync_cmd(dev, command, p1, p2, p3, p4, p5, p6, status, r1, r2, r3, r4)
 
-#define aac_adapter_check_health(dev) \
-	(dev)->a_ops.adapter_check_health(dev)
-
 #define aac_adapter_restart(dev, bled, reset_type) \
 	((dev)->a_ops.adapter_restart(dev, bled, reset_type))
 
@@ -2615,6 +2612,14 @@  static inline unsigned int cap_to_cyls(sector_t capacity, unsigned divisor)
 	return capacity;
 }
 
+static inline int aac_adapter_check_health(struct aac_dev *dev)
+{
+	if (unlikely(pci_channel_offline(dev->pdev)))
+		return -1;
+
+	return (dev)->a_ops.adapter_check_health(dev);
+}
+
 /* SCp.phase values */
 #define AAC_OWNER_MIDLEVEL	0x101
 #define AAC_OWNER_LOWLEVEL	0x102
diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
index c8172f16cf33..1f4918355fdb 100644
--- a/drivers/scsi/aacraid/commsup.c
+++ b/drivers/scsi/aacraid/commsup.c
@@ -1873,7 +1873,8 @@  int aac_check_health(struct aac_dev * aac)
 	spin_unlock_irqrestore(&aac->fib_lock, flagv);
 
 	if (BlinkLED < 0) {
-		printk(KERN_ERR "%s: Host adapter dead %d\n", aac->name, BlinkLED);
+		printk(KERN_ERR "%s: Host adapter is dead (or got a PCI error) %d\n",
+				aac->name, BlinkLED);
 		goto out;
 	}