From patchwork Wed Dec 6 14:03:55 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Laight X-Patchwork-Id: 10095909 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id DD08460329 for ; Wed, 6 Dec 2017 14:03:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CF79728068 for ; Wed, 6 Dec 2017 14:03:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C3F762875C; Wed, 6 Dec 2017 14:03:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1A4B528068 for ; Wed, 6 Dec 2017 14:03:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751815AbdLFODi convert rfc822-to-8bit (ORCPT ); Wed, 6 Dec 2017 09:03:38 -0500 Received: from smtp-out4.electric.net ([192.162.216.185]:57141 "EHLO smtp-out4.electric.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751598AbdLFODh (ORCPT ); Wed, 6 Dec 2017 09:03:37 -0500 Received: from 1eMaIP-0003I2-Vw by out4d.electric.net with emc1-ok (Exim 4.87) (envelope-from ) id 1eMaIQ-0003Jt-UH for linux-pci@vger.kernel.org; Wed, 06 Dec 2017 06:03:34 -0800 Received: by emcmailer; Wed, 06 Dec 2017 06:03:34 -0800 Received: from [156.67.243.126] (helo=AcuMS.aculab.com) by out4d.electric.net with esmtps (TLSv1.2:ECDHE-RSA-AES256-SHA384:256) (Exim 4.87) (envelope-from ) id 1eMaIP-0003I2-Vw for linux-pci@vger.kernel.org; Wed, 06 Dec 2017 06:03:33 -0800 Received: from AcuMS.Aculab.com (fd9f:af1c:a25b:0:43c:695e:880f:8750) by AcuMS.aculab.com (fd9f:af1c:a25b:0:43c:695e:880f:8750) with Microsoft SMTP Server (TLS) id 15.0.1347.2; Wed, 6 Dec 2017 14:03:55 +0000 Received: from AcuMS.Aculab.com ([fe80::43c:695e:880f:8750]) by AcuMS.aculab.com ([fe80::43c:695e:880f:8750%12]) with mapi id 15.00.1347.000; Wed, 6 Dec 2017 14:03:55 +0000 From: David Laight To: "'linux-pci@vger.kernel.org'" Subject: PCIe link not recovering Thread-Topic: PCIe link not recovering Thread-Index: AdNumw4M+hfLnbIlQb+cL2w0WEDPRQ== Date: Wed, 6 Dec 2017 14:03:55 +0000 Message-ID: <3d3c69b6dcc44963b1ad80120c69767b@AcuMS.aculab.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.202.205.33] MIME-Version: 1.0 X-Outbound-IP: 156.67.243.126 X-Env-From: David.Laight@ACULAB.COM X-Proto: esmtps X-Revdns: X-HELO: AcuMS.aculab.com X-TLS: TLSv1.2:ECDHE-RSA-AES256-SHA384:256 X-Authenticated_ID: X-PolicySMART: 3396946, 3397078 X-Virus-Status: Scanned by VirusSMART (c) X-Virus-Status: Scanned by VirusSMART (s) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If I perform the following: 1) echo 1 >/sys/devices/pcixxxx/xxx/remove 2) completely reset the PCIe endpoint 3) echo 1 >/sys/devices/pcixxxx/rescan I expect the endpoint to be reprobed (provided the BARs are compatible). However on a new motherboard (SkyLake) it looks as though the root bridge isn't trying to bring the PCIe link back up. (The same system disk works fine in a slighty older system.) There are 2 bits different in the lspci output for the root port between the 'link up' and 'link down states': The first just looks like a notification that there has been an error. The last one might imply that it knows it needs to do something - but needs to be 'kicked'. I believe that the endpoint is flipping between the 'Detect Active' and 'Detect Quiet' states. Which would imply that the root port isn't trying to establish the link. Does the kernel have to prod the root complex somehow? David The full output (link_down) is: 00:01.0 PCI bridge: Intel Corporation Skylake PCIe Controller (x16) (rev 05) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [88] Subsystem: Super Micro Computer Inc Skylake PCIe Controller (x16) Capabilities: [80] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- Address: fee00218 Data: 0000 Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0 ExtTag- RBE+ DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #2, Speed 8GT/s, Width x8, ASPM L0s L1, Exit Latency L0s <256ns, L1 <8us ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt- SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- Slot #1, PowerLimit 75.000W; Interlock- NoCompl+ SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- Changed: MRL- PresDet+ LinkState- RootCtl: ErrCorrectable+ ErrNon-Fatal+ ErrFatal+ PMEIntEna+ CRSVisible- RootCap: CRSVisible- RootSta: PME ReqID 0000, PMEStatus- PMEPending- DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd- LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending+ InProgress- Capabilities: [140 v1] Root Complex Link Desc: PortNumber=02 ComponentID=01 EltType=Config Link0: Desc: TargetPort=00 TargetComponent=01 AssocRCRB- LinkType=MemMapped LinkValid+ Addr: 00000000fed19000 Capabilities: [d94 v1] #19 Kernel driver in use: pcieport Kernel modules: shpchp --- link_up 2017-12-06 13:37:37.523661300 +0000 +++ link_down 2017-12-06 13:37:37.509614300 +0000 @@ -21,7 +21,7 @@ DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 128 bytes - DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- + DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #2, Speed 8GT/s, Width x8, ASPM L0s L1, Exit Latency L0s <256ns, L1 <8us ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ @@ -51,7 +51,7 @@ VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff - Status: NegoPending- InProgress- + Status: NegoPending+ InProgress- Capabilities: [140 v1] Root Complex Link Desc: PortNumber=02 ComponentID=01 EltType=Config Link0: Desc: TargetPort=00 TargetComponent=01 AssocRCRB- LinkType=MemMapped LinkValid+