diff mbox series

scsi: mpt3sas: disable ASPM for mpt3sas / SAS3.0

Message ID 1616701889-77537-1-git-send-email-ice799@gmail.com (mailing list archive)
State Deferred
Headers show
Series scsi: mpt3sas: disable ASPM for mpt3sas / SAS3.0 | expand

Commit Message

Joe Damato March 25, 2021, 7:51 p.m. UTC
Noticed commit ffdadd68af5a ("scsi: mpt3sas: disable ASPM for MPI2
controllers") disables ASPM for SAS-2.0 HBAs, but this change was not
replicated for SAS-3.0 HBAs. This change replicates this behavior.

Signed-off-by: Joe Damato <ice799@gmail.com>
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Martin K. Petersen April 6, 2021, 4 a.m. UTC | #1
Joe,

> Noticed commit ffdadd68af5a ("scsi: mpt3sas: disable ASPM for MPI2
> controllers") disables ASPM for SAS-2.0 HBAs, but this change was not
> replicated for SAS-3.0 HBAs. This change replicates this behavior.

Do you have a system that exhibits problems with ASPM enabled?
Joe Damato April 6, 2021, 7:01 p.m. UTC | #2
On Mon, Apr 5, 2021 at 9:00 PM Martin K. Petersen
<martin.petersen@oracle.com> wrote:
>
>
> Joe,
>
> > Noticed commit ffdadd68af5a ("scsi: mpt3sas: disable ASPM for MPI2
> > controllers") disables ASPM for SAS-2.0 HBAs, but this change was not
> > replicated for SAS-3.0 HBAs. This change replicates this behavior.
>
> Do you have a system that exhibits problems with ASPM enabled?

I am not sure.

I get intermittent messages in dmesg as seen below and stumbled upon
commit ffdadd68af5a while researching, which looked similar.

I haven't found a way to easily or reliably reproduce this issue, but
it surfaces as dmesg reporting an unknown NMI, and all the disks
suddenly going offline. There is some sort of controller fault
occurring because of the dmesg line which says "mpt3sas_cm0:
_base_fault_reset_work: Running mpt3sas_dead_ioc thread success."

My naive thought process was that:

- A message from Sreekanth back in ~2016 suggested that it should be
disabled explicitly for SAS-2.0 [1] - perhaps this is also true for
SAS-3.0 ?
- Not sure, but disabling ASPM for SAS-3.0 probably wouldn't
negatively impact users
- Disabling ASPM explicitly in the driver only has an impact if the
BIOS has given kernel control of ASPM, but could be a good safeguard.
- It may (or may not) reduce the incidence of this event I sporadically see.

Is there a way to induce ASPM events so that I could test this? Or
perhaps can I tweak the fault handler to get more information about
the specific type of fault?

All in all I figured the change was relatively harmless and could
reduce the incidence of this sporadic NMI I see.

Thanks,
Joe

[1]: https://patchwork.kernel.org/project/linux-scsi/patch/20161228110524.7516-1-ojab@ojab.ru/#20106435

1513141.713575] Uhhuh. NMI received for unknown reason 30 on CPU 0.
[1513141.713576] Do you have a strange power saving mode enabled?
[1513141.713577] Dazed and confused, but trying to continue
[1513141.839140] mpt3sas_cm0: SAS host is non-operational !!!!
[1513142.867056] mpt3sas_cm0: SAS host is non-operational !!!!
[1513143.890996] mpt3sas_cm0: SAS host is non-operational !!!!
[1513144.914887] mpt3sas_cm0: SAS host is non-operational !!!!
[1513145.934806] mpt3sas_cm0: SAS host is non-operational !!!!
[1513146.958724] mpt3sas_cm0: SAS host is non-operational !!!!
[1513146.965053] mpt3sas_cm0: _base_fault_reset_work: Running
mpt3sas_dead_ioc thread success !!!!
[1513146.965423] sd 0:0:7:0: [sdh] tag#0 FAILED Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[1513146.973762] sd 0:0:7:0: [sdh] tag#0 CDB: Read(10) 28 00 d7 72 30
b0 00 00 10 00
[1513146.973764] print_req_error: I/O error, dev sdh, sector 3614585008
[1513146.978754] sd 0:0:6:0: [sdg] tag#29 FAILED Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[1513146.978756] sd 0:0:6:0: [sdg] tag#9 FAILED Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[1513146.978757] sd 0:0:6:0: [sdg] tag#33 FAILED Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[1513146.978759] sd 0:0:6:0: [sdg] tag#33 CDB: Read(10) 28 00 d8 47 30
68 00 00 30 00
[1513146.978760] sd 0:0:6:0: [sdg] tag#9 CDB: Write(10) 2a 00 61 d1 ae
20 00 04 00 00
diff mbox series

Patch

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 6aa6de7..bc038e4 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -11842,6 +11842,8 @@  _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		break;
 	case MPI25_VERSION:
 	case MPI26_VERSION:
+		pci_disable_link_state(pdev, PCIE_LINK_STATE_L0S |
+			PCIE_LINK_STATE_L1 | PCIE_LINK_STATE_CLKPM);
 		/* Use mpt3sas driver host template for SAS 3.0 HBA's */
 		shost = scsi_host_alloc(&mpt3sas_driver_template,
 		  sizeof(struct MPT3SAS_ADAPTER));