Message ID | 20181127083454.26560-1-Bharat.Bhushan@nxp.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI: Mark NXP LS1088 to avoid bus reset bus | expand |
[+cc David, Jan, Alex, Maik, Chris] On Tue, Nov 27, 2018 at 08:46:33AM +0000, Bharat Bhushan wrote: > NXP (Freescale Vendor ID) LS1088 chips do not behave correctly after > bus reset with e1000e. Link state of device does not comes UP and so > config space never accessible again. Previous similar commits: 822155100e58 ("PCI: Mark Cavium CN8xxx to avoid bus reset") 8e2e03179923 ("PCI: Mark Atheros AR9580 to avoid bus reset") 9ac0108c2bac ("PCI: Mark Atheros AR9485 and QCA9882 to avoid bus reset") c3e59ee4e766 ("PCI: Mark Atheros AR93xx to avoid bus reset") 1) Please make your subject match (remove the spurious "bus" at the end) 2) This should probably be marked for stable (v3.14 and later, since the quirk itself appeared in v3.19 and marked for v3.14 and later stable kernels). Maybe even mark it as "Fixes: c3e59ee4e766..." to connect it. 3) The 1957:80c0 PCI ID doesn't appear in https://pci-ids.ucw.cz/; can you add it? 4) Is there a hardware erratum for this? If so, please include the URL here. 5) Can you reproduce the problem using the same endpoint (e1000e) on a different system with a different bridge? 6) Have you looked at this with a PCIe analyzer? It would be very interesting to compare the boot-time or system reboot path with the individual bus reset path you're fixing. Since there are several similar reports and they sometimes involve the same devices (both your patch and 822155100e58 mention e1000e), I'm a little suspicious that we're doing something wrong in the bus reset path. I think bus reset uses Secondary Bus Reset in the Bridge Control register. That's a generic mechanism that I would expect to be pretty well-tested. I suspect the BIOS probably uses it in the reboot path, and the device probably works after that. So I wonder if the Linux delay isn't quite long enough, or our first access to the device isn't quite right, e.g., maybe there's some issue with the bus/device number capture (PCIe r4.0, sec 2.2.6.2). > Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com> > --- > drivers/pci/quirks.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index 4700d24e5d55..b9ae4e9f101a 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -3391,6 +3391,13 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0033, quirk_no_bus_reset); > */ > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CAVIUM, 0xa100, quirk_no_bus_reset); > > +/* > + * NXP (Freescale Vendor ID) LS1088 chips do not behave correctly after > + * bus reset. Link state of device does not comes UP and so config space > + * never accessible again. > + */ > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_FREESCALE, 0x80c0, quirk_no_bus_reset); > + > static void quirk_no_pm_reset(struct pci_dev *dev) > { > /* > -- > 2.19.1 >
On Tue, 27 Nov 2018 09:33:56 -0600 Bjorn Helgaas <helgaas@kernel.org> wrote: > [+cc David, Jan, Alex, Maik, Chris] > > On Tue, Nov 27, 2018 at 08:46:33AM +0000, Bharat Bhushan wrote: > > NXP (Freescale Vendor ID) LS1088 chips do not behave correctly after > > bus reset with e1000e. Link state of device does not comes UP and so > > config space never accessible again. > > Previous similar commits: > > 822155100e58 ("PCI: Mark Cavium CN8xxx to avoid bus reset") > 8e2e03179923 ("PCI: Mark Atheros AR9580 to avoid bus reset") > 9ac0108c2bac ("PCI: Mark Atheros AR9485 and QCA9882 to avoid bus reset") > c3e59ee4e766 ("PCI: Mark Atheros AR93xx to avoid bus reset") > > 1) Please make your subject match (remove the spurious "bus" at the > end) > > 2) This should probably be marked for stable (v3.14 and later, since > the quirk itself appeared in v3.19 and marked for v3.14 and later > stable kernels). Maybe even mark it as "Fixes: c3e59ee4e766..." to > connect it. > > 3) The 1957:80c0 PCI ID doesn't appear in https://pci-ids.ucw.cz/; can > you add it? > > 4) Is there a hardware erratum for this? If so, please include the > URL here. > > 5) Can you reproduce the problem using the same endpoint (e1000e) on a > different system with a different bridge? > > 6) Have you looked at this with a PCIe analyzer? It would be very > interesting to compare the boot-time or system reboot path with the > individual bus reset path you're fixing. > > Since there are several similar reports and they sometimes involve the > same devices (both your patch and 822155100e58 mention e1000e), I'm a > little suspicious that we're doing something wrong in the bus reset > path. I agree, entirely excluding bus resets is not something to be taken lightly. It's less than ideal for an endpoint and a fairly major functional gap for a downstream port. It should really be considered a last resort. > I think bus reset uses Secondary Bus Reset in the Bridge Control > register. That's a generic mechanism that I would expect to be pretty > well-tested. I suspect the BIOS probably uses it in the reboot path, > and the device probably works after that. > > So I wonder if the Linux delay isn't quite long enough, or our first > access to the device isn't quite right, e.g., maybe there's some issue > with the bus/device number capture (PCIe r4.0, sec 2.2.6.2). Tweaking the delay would be a reasonable solution, though we are seeing some issues where users with lots of assigned devices that require bus resets experience long delays as vfio file descriptors are closed sequentially on exit. So perhaps we could flag downstream ports requiring an extra delay, if that becomes a solution. Your mention of the bus/device number also reminds me of the issue we saw on Threadripper where there were patches proposed to re-write the secondary and subordinate bus numbers after reset. AMD was able to resolve that in a firmware update, but there could be something similar occurring here. Thanks, Alex > > Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com> > > --- > > drivers/pci/quirks.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > > index 4700d24e5d55..b9ae4e9f101a 100644 > > --- a/drivers/pci/quirks.c > > +++ b/drivers/pci/quirks.c > > @@ -3391,6 +3391,13 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0033, quirk_no_bus_reset); > > */ > > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CAVIUM, 0xa100, quirk_no_bus_reset); > > > > +/* > > + * NXP (Freescale Vendor ID) LS1088 chips do not behave correctly after > > + * bus reset. Link state of device does not comes UP and so config space > > + * never accessible again. > > + */ > > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_FREESCALE, 0x80c0, quirk_no_bus_reset); > > + > > static void quirk_no_pm_reset(struct pci_dev *dev) > > { > > /* > > -- > > 2.19.1 > >
Hi, > -----Original Message----- > From: Alex Williamson <alex.williamson@redhat.com> > Sent: Tuesday, November 27, 2018 9:39 PM > To: Bjorn Helgaas <helgaas@kernel.org> > Cc: Bharat Bhushan <bharat.bhushan@nxp.com>; linux-pci@vger.kernel.org; > linux-kernel@vger.kernel.org; bharatb.yadav@gmail.com; David Daney > <david.daney@cavium.com>; Jan Glauber <jglauber@cavium.com>; Maik > Broemme <mbroemme@libmpq.org>; Chris Blake > <chrisrblake93@gmail.com> > Subject: Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus > > On Tue, 27 Nov 2018 09:33:56 -0600 > Bjorn Helgaas <helgaas@kernel.org> wrote: > > > [+cc David, Jan, Alex, Maik, Chris] > > > > On Tue, Nov 27, 2018 at 08:46:33AM +0000, Bharat Bhushan wrote: > > > NXP (Freescale Vendor ID) LS1088 chips do not behave correctly after > > > bus reset with e1000e. Link state of device does not comes UP and so > > > config space never accessible again. > > > > Previous similar commits: > > > > 822155100e58 ("PCI: Mark Cavium CN8xxx to avoid bus reset") > > 8e2e03179923 ("PCI: Mark Atheros AR9580 to avoid bus reset") > > 9ac0108c2bac ("PCI: Mark Atheros AR9485 and QCA9882 to avoid bus > reset") > > c3e59ee4e766 ("PCI: Mark Atheros AR93xx to avoid bus reset") > > > > 1) Please make your subject match (remove the spurious "bus" at the > > end) Will correct, added by mistake > > > > 2) This should probably be marked for stable (v3.14 and later, since > > the quirk itself appeared in v3.19 and marked for v3.14 and later > > stable kernels). Maybe even mark it as "Fixes: c3e59ee4e766..." to > > connect it. Ok, > > > > 3) The 1957:80c0 PCI ID doesn't appear in > > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpci > - > ids.ucw.cz%2F&data=02%7C01%7Cbharat.bhushan%40nxp.com%7C296 > 02a2efa584249221808d65482945b%7C686ea1d3bc2b4c6fa92cd99c5c301635%7 > C0%7C0%7C636789317139032063&sdata=3jkRMa1NljSCp%2BvZP0kgz7D > PWPJZH8d7JXhCE5vCCMk%3D&reserved=0; can you add it? > > Yes, I will add > > 4) Is there a hardware erratum for this? If so, please include the > > URL here. No h/w errata as of now. > > > > 5) Can you reproduce the problem using the same endpoint (e1000e) on a > > different system with a different bridge? I have multiple LS1088 boards and I can observe problem with all LS1088 boards. While when I uses same PCI device on other NXP board (LS2088) then it works fine. > > > > 6) Have you looked at this with a PCIe analyzer? It would be very > > interesting to compare the boot-time or system reboot path with the > > individual bus reset path you're fixing. I have not used PCIe analyzer, > > > > Since there are several similar reports and they sometimes involve the > > same devices (both your patch and 822155100e58 mention e1000e), I'm a > > little suspicious that we're doing something wrong in the bus reset > > path. > > I agree, entirely excluding bus resets is not something to be taken lightly. It's > less than ideal for an endpoint and a fairly major functional gap for a > downstream port. It should really be considered a last resort. > > > I think bus reset uses Secondary Bus Reset in the Bridge Control > > register. That's a generic mechanism that I would expect to be pretty > > well-tested. I suspect the BIOS probably uses it in the reboot path, > > and the device probably works after that. > > > > So I wonder if the Linux delay isn't quite long enough, or our first > > access to the device isn't quite right, e.g., maybe there's some issue > > with the bus/device number capture (PCIe r4.0, sec 2.2.6.2). > > Tweaking the delay would be a reasonable solution, though we are seeing > some issues where users with lots of assigned devices that require bus > resets experience long delays as vfio file descriptors are closed sequentially > on exit. In pci_reset_secondary_bus() I have tried to increase the delay after reset but not helped. Do I need to add delay at some other place as well? Thanks -Bharat > So perhaps we could flag downstream ports requiring an extra delay, > if that becomes a solution. Your mention of the bus/device number also > reminds me of the issue we saw on Threadripper where there were patches > proposed to re-write the secondary and subordinate bus numbers after > reset. AMD was able to resolve that in a firmware update, but there could > be something similar occurring here. Thanks, > > Alex > > > > Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com> > > > --- > > > drivers/pci/quirks.c | 7 +++++++ > > > 1 file changed, 7 insertions(+) > > > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index > > > 4700d24e5d55..b9ae4e9f101a 100644 > > > --- a/drivers/pci/quirks.c > > > +++ b/drivers/pci/quirks.c > > > @@ -3391,6 +3391,13 @@ > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0033, > quirk_no_bus_reset); > > > */ > > > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CAVIUM, 0xa100, > > > quirk_no_bus_reset); > > > > > > +/* > > > + * NXP (Freescale Vendor ID) LS1088 chips do not behave correctly > > > +after > > > + * bus reset. Link state of device does not comes UP and so config > > > +space > > > + * never accessible again. > > > + */ > > > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_FREESCALE, 0x80c0, > > > +quirk_no_bus_reset); > > > + > > > static void quirk_no_pm_reset(struct pci_dev *dev) { > > > /* > > > -- > > > 2.19.1 > > >
On Tue, Nov 27, 2018 at 10:32 PM Bharat Bhushan <bharat.bhushan@nxp.com> wrote: > > -----Original Message----- > > From: Alex Williamson <alex.williamson@redhat.com> > > Sent: Tuesday, November 27, 2018 9:39 PM > > To: Bjorn Helgaas <helgaas@kernel.org> > > Cc: Bharat Bhushan <bharat.bhushan@nxp.com>; linux-pci@vger.kernel.org; > > linux-kernel@vger.kernel.org; bharatb.yadav@gmail.com; David Daney > > <david.daney@cavium.com>; Jan Glauber <jglauber@cavium.com>; Maik > > Broemme <mbroemme@libmpq.org>; Chris Blake > > <chrisrblake93@gmail.com> > > Subject: Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus > > > > On Tue, 27 Nov 2018 09:33:56 -0600 > > Bjorn Helgaas <helgaas@kernel.org> wrote: > > > 4) Is there a hardware erratum for this? If so, please include the > > > URL here. > > No h/w errata as of now. Does that mean (a) the HW folks agree this is a hardware problem but they haven't written an erratum, (b) there is an erratum but it isn't public, (c) we don't have any concrete evidence of a hardware problem, but things just don't work if we do a bus reset, (d) something else? > In pci_reset_secondary_bus() I have tried to increase the delay after reset but not helped. > Do I need to add delay at some other place as well? No, I think the place you tried should be enough. You should also be able to exercise this from user-space by using "setpci" to set and clear the Secondary Bus Reset bit in the Bridge Control register. Then you can also use setpci to read/write config space of the NIC. The kernel would normally read the Vendor and Device IDs as the first access to the device during enumeration. You also might be able to learn something by using "lspci -vv" on the bridge before and after the reset to see if it logs any AER bits (if it supports AER) or the other standard error logging bits.
Hi, > -----Original Message----- > From: Bjorn Helgaas <bhelgaas@google.com> > Sent: Thursday, November 29, 2018 1:46 AM > To: Bharat Bhushan <bharat.bhushan@nxp.com> > Cc: alex.williamson@redhat.com; Bjorn Helgaas <helgaas@kernel.org>; linux- > pci@vger.kernel.org; Linux Kernel Mailing List <linux- > kernel@vger.kernel.org>; bharatb.yadav@gmail.com; David Daney > <david.daney@cavium.com>; jglauber@cavium.com; > mbroemme@libmpq.org; chrisrblake93@gmail.com > Subject: Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus > > On Tue, Nov 27, 2018 at 10:32 PM Bharat Bhushan > <bharat.bhushan@nxp.com> wrote: > > > > -----Original Message----- > > > From: Alex Williamson <alex.williamson@redhat.com> > > > Sent: Tuesday, November 27, 2018 9:39 PM > > > To: Bjorn Helgaas <helgaas@kernel.org> > > > Cc: Bharat Bhushan <bharat.bhushan@nxp.com>; > > > linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; > > > bharatb.yadav@gmail.com; David Daney <david.daney@cavium.com>; > Jan > > > Glauber <jglauber@cavium.com>; Maik Broemme > <mbroemme@libmpq.org>; > > > Chris Blake <chrisrblake93@gmail.com> > > > Subject: Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus > > > > > > On Tue, 27 Nov 2018 09:33:56 -0600 > > > Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > 4) Is there a hardware erratum for this? If so, please include > > > > the URL here. > > > > No h/w errata as of now. > > Does that mean (a) the HW folks agree this is a hardware problem but they > haven't written an erratum, (b) there is an erratum but it isn't public, (c) we > don't have any concrete evidence of a hardware problem, but things just > don't work if we do a bus reset, (d) something else? I will say it is (c) - not concluded to be hardware h/w issue. > > > In pci_reset_secondary_bus() I have tried to increase the delay after reset > but not helped. > > Do I need to add delay at some other place as well? > > No, I think the place you tried should be enough. > > You should also be able to exercise this from user-space by using "setpci" to > set and clear the Secondary Bus Reset bit in the Bridge Control register. Then > you can also use setpci to read/write config space of the NIC. The kernel > would normally read the Vendor and Device IDs as the first access to the > device during enumeration. You also might be able to learn something by > using "lspci -vv" on the bridge before and after the reset to see if it logs any > AER bits (if it supports AER) or the other standard error logging bits. I tried below sequence for Secondary bus reset and device config space show 0xff root@localhost:~# lspci -x 0002:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 80c0 (rev 10) 00: 57 19 c0 80 07 01 10 00 10 00 04 06 08 00 01 00 10: 00 00 00 00 00 00 00 00 00 01 ff 00 01 01 00 00 20: 00 40 00 40 f1 ff 01 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 40 00 00 00 00 00 00 40 63 01 00 00 0002:01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 00: 86 80 d3 10 06 04 10 00 00 00 00 02 10 00 00 00 10: 00 00 0c 40 00 00 00 40 01 00 00 00 00 00 0e 40 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 1f a0 30: 00 00 24 40 c8 00 00 00 00 00 00 00 63 01 00 00 root@localhost:~# setpci -s 0002:00:00.0 0x3e.b=0x40 root@localhost:~# setpci -s 0002:00:00.0 0x3e.b=0x00 root@localhost:~# lspci -x 0002:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 80c0 (rev 10) 00: 57 19 c0 80 07 01 10 00 10 00 04 06 08 00 01 00 10: 00 00 00 00 00 00 00 00 00 01 ff 00 01 01 00 00 20: 00 40 00 40 f1 ff 01 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 40 00 00 00 00 00 00 40 63 01 00 00 0002:01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection (rev ff) 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff Thanks -Bharat
On Fri, 30 Nov 2018 05:29:47 +0000 Bharat Bhushan <bharat.bhushan@nxp.com> wrote: > Hi, > > > -----Original Message----- > > From: Bjorn Helgaas <bhelgaas@google.com> > > Sent: Thursday, November 29, 2018 1:46 AM > > To: Bharat Bhushan <bharat.bhushan@nxp.com> > > Cc: alex.williamson@redhat.com; Bjorn Helgaas <helgaas@kernel.org>; linux- > > pci@vger.kernel.org; Linux Kernel Mailing List <linux- > > kernel@vger.kernel.org>; bharatb.yadav@gmail.com; David Daney > > <david.daney@cavium.com>; jglauber@cavium.com; > > mbroemme@libmpq.org; chrisrblake93@gmail.com > > Subject: Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus > > > > On Tue, Nov 27, 2018 at 10:32 PM Bharat Bhushan > > <bharat.bhushan@nxp.com> wrote: > > > > > > -----Original Message----- > > > > From: Alex Williamson <alex.williamson@redhat.com> > > > > Sent: Tuesday, November 27, 2018 9:39 PM > > > > To: Bjorn Helgaas <helgaas@kernel.org> > > > > Cc: Bharat Bhushan <bharat.bhushan@nxp.com>; > > > > linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; > > > > bharatb.yadav@gmail.com; David Daney <david.daney@cavium.com>; > > Jan > > > > Glauber <jglauber@cavium.com>; Maik Broemme > > <mbroemme@libmpq.org>; > > > > Chris Blake <chrisrblake93@gmail.com> > > > > Subject: Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus > > > > > > > > On Tue, 27 Nov 2018 09:33:56 -0600 > > > > Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > > 4) Is there a hardware erratum for this? If so, please include > > > > > the URL here. > > > > > > No h/w errata as of now. > > > > Does that mean (a) the HW folks agree this is a hardware problem but they > > haven't written an erratum, (b) there is an erratum but it isn't public, (c) we > > don't have any concrete evidence of a hardware problem, but things just > > don't work if we do a bus reset, (d) something else? > > I will say it is (c) - not concluded to be hardware h/w issue. > > > > > > In pci_reset_secondary_bus() I have tried to increase the delay after reset > > but not helped. > > > Do I need to add delay at some other place as well? > > > > No, I think the place you tried should be enough. > > > > You should also be able to exercise this from user-space by using "setpci" to > > set and clear the Secondary Bus Reset bit in the Bridge Control register. Then > > you can also use setpci to read/write config space of the NIC. The kernel > > would normally read the Vendor and Device IDs as the first access to the > > device during enumeration. You also might be able to learn something by > > using "lspci -vv" on the bridge before and after the reset to see if it logs any > > AER bits (if it supports AER) or the other standard error logging bits. > > I tried below sequence for Secondary bus reset and device config space show 0xff > > root@localhost:~# lspci -x > 0002:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 80c0 (rev 10) > 00: 57 19 c0 80 07 01 10 00 10 00 04 06 08 00 01 00 > 10: 00 00 00 00 00 00 00 00 00 01 ff 00 01 01 00 00 > 20: 00 40 00 40 f1 ff 01 00 00 00 00 00 00 00 00 00 > 30: 00 00 00 00 40 00 00 00 00 00 00 40 63 01 00 00 > > 0002:01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection > 00: 86 80 d3 10 06 04 10 00 00 00 00 02 10 00 00 00 > 10: 00 00 0c 40 00 00 00 40 01 00 00 00 00 00 0e 40 > 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 1f a0 > 30: 00 00 24 40 c8 00 00 00 00 00 00 00 63 01 00 00 > > root@localhost:~# setpci -s 0002:00:00.0 0x3e.b=0x40 > root@localhost:~# setpci -s 0002:00:00.0 0x3e.b=0x00 > > root@localhost:~# lspci -x > 0002:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 80c0 (rev 10) > 00: 57 19 c0 80 07 01 10 00 10 00 04 06 08 00 01 00 > 10: 00 00 00 00 00 00 00 00 00 01 ff 00 01 01 00 00 > 20: 00 40 00 40 f1 ff 01 00 00 00 00 00 00 00 00 00 > 30: 00 00 00 00 40 00 00 00 00 00 00 40 63 01 00 00 Just for curiosity sake, what if you re-write the secondary and subordinate bus registers here: # setpci -s 0002:00:00.0 0x19.b=0x01 # setpci -s 0002:00:00.0 0x1a.b=0xff IIRC the users that debugged the AMD bus reset issue re-wrote the entire 64 bytes of the bridge config header and then further narrowed the issue down to the two registers above. If one bridge implementation can have such an issue, maybe others do too. Perhaps there's common IP in use. Are you able to test other endpoints besides this e1000e device with this setpci technique? Thanks, Alex > 0002:01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection (rev ff) > 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > Thanks > -Bharat > >
Hi Alex, > -----Original Message----- > From: Alex Williamson <alex.williamson@redhat.com> > Sent: Friday, November 30, 2018 11:26 AM > To: Bharat Bhushan <bharat.bhushan@nxp.com> > Cc: Bjorn Helgaas <bhelgaas@google.com>; Bjorn Helgaas > <helgaas@kernel.org>; linux-pci@vger.kernel.org; Linux Kernel Mailing List > <linux-kernel@vger.kernel.org>; bharatb.yadav@gmail.com; David Daney > <david.daney@cavium.com>; jglauber@cavium.com; > mbroemme@libmpq.org; chrisrblake93@gmail.com > Subject: Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus > > On Fri, 30 Nov 2018 05:29:47 +0000 > Bharat Bhushan <bharat.bhushan@nxp.com> wrote: > > > Hi, > > > > > -----Original Message----- > > > From: Bjorn Helgaas <bhelgaas@google.com> > > > Sent: Thursday, November 29, 2018 1:46 AM > > > To: Bharat Bhushan <bharat.bhushan@nxp.com> > > > Cc: alex.williamson@redhat.com; Bjorn Helgaas <helgaas@kernel.org>; > > > linux- pci@vger.kernel.org; Linux Kernel Mailing List <linux- > > > kernel@vger.kernel.org>; bharatb.yadav@gmail.com; David Daney > > > <david.daney@cavium.com>; jglauber@cavium.com; > mbroemme@libmpq.org; > > > chrisrblake93@gmail.com > > > Subject: Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus > > > > > > On Tue, Nov 27, 2018 at 10:32 PM Bharat Bhushan > > > <bharat.bhushan@nxp.com> wrote: > > > > > > > > -----Original Message----- > > > > > From: Alex Williamson <alex.williamson@redhat.com> > > > > > Sent: Tuesday, November 27, 2018 9:39 PM > > > > > To: Bjorn Helgaas <helgaas@kernel.org> > > > > > Cc: Bharat Bhushan <bharat.bhushan@nxp.com>; > > > > > linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; > > > > > bharatb.yadav@gmail.com; David Daney > <david.daney@cavium.com>; > > > Jan > > > > > Glauber <jglauber@cavium.com>; Maik Broemme > > > <mbroemme@libmpq.org>; > > > > > Chris Blake <chrisrblake93@gmail.com> > > > > > Subject: Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus > > > > > > > > > > On Tue, 27 Nov 2018 09:33:56 -0600 Bjorn Helgaas > > > > > <helgaas@kernel.org> wrote: > > > > > > > > > 4) Is there a hardware erratum for this? If so, please > > > > > > include the URL here. > > > > > > > > No h/w errata as of now. > > > > > > Does that mean (a) the HW folks agree this is a hardware problem but > > > they haven't written an erratum, (b) there is an erratum but it > > > isn't public, (c) we don't have any concrete evidence of a hardware > > > problem, but things just don't work if we do a bus reset, (d) something > else? > > > > I will say it is (c) - not concluded to be hardware h/w issue. > > > > > > > > > In pci_reset_secondary_bus() I have tried to increase the delay > > > > after reset > > > but not helped. > > > > Do I need to add delay at some other place as well? > > > > > > No, I think the place you tried should be enough. > > > > > > You should also be able to exercise this from user-space by using > > > "setpci" to set and clear the Secondary Bus Reset bit in the Bridge > > > Control register. Then you can also use setpci to read/write config > > > space of the NIC. The kernel would normally read the Vendor and > > > Device IDs as the first access to the device during enumeration. > > > You also might be able to learn something by using "lspci -vv" on > > > the bridge before and after the reset to see if it logs any AER bits (if it > supports AER) or the other standard error logging bits. > > > > I tried below sequence for Secondary bus reset and device config space > > show 0xff > > > > root@localhost:~# lspci -x > > 0002:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 80c0 (rev > > 10) > > 00: 57 19 c0 80 07 01 10 00 10 00 04 06 08 00 01 00 > > 10: 00 00 00 00 00 00 00 00 00 01 ff 00 01 01 00 00 > > 20: 00 40 00 40 f1 ff 01 00 00 00 00 00 00 00 00 00 > > 30: 00 00 00 00 40 00 00 00 00 00 00 40 63 01 00 00 > > > > 0002:01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit > > Network Connection > > 00: 86 80 d3 10 06 04 10 00 00 00 00 02 10 00 00 00 > > 10: 00 00 0c 40 00 00 00 40 01 00 00 00 00 00 0e 40 > > 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 1f a0 > > 30: 00 00 24 40 c8 00 00 00 00 00 00 00 63 01 00 00 > > > > root@localhost:~# setpci -s 0002:00:00.0 0x3e.b=0x40 > > root@localhost:~# setpci -s 0002:00:00.0 0x3e.b=0x00 > > > > root@localhost:~# lspci -x > > 0002:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 80c0 (rev > > 10) > > 00: 57 19 c0 80 07 01 10 00 10 00 04 06 08 00 01 00 > > 10: 00 00 00 00 00 00 00 00 00 01 ff 00 01 01 00 00 > > 20: 00 40 00 40 f1 ff 01 00 00 00 00 00 00 00 00 00 > > 30: 00 00 00 00 40 00 00 00 00 00 00 40 63 01 00 00 > > Just for curiosity sake, what if you re-write the secondary and subordinate > bus registers here: > > # setpci -s 0002:00:00.0 0x19.b=0x01 > # setpci -s 0002:00:00.0 0x1a.b=0xff Result is same, here are logs root@localhost:~# setpci -s 0002:00:00.0 0x3e.b=0x40 root@localhost:~# setpci -s 0002:00:00.0 0x3e.b=0x00 root@localhost:~# setpci -s 0002:00:00.0 0x19.b=0x01 root@localhost:~# setpci -s 0002:00:00.0 0x1a.b=0xff root@localhost:~# lspci -x 0002:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 80c0 (rev 10) 00: 57 19 c0 80 07 01 10 00 10 00 04 06 08 00 01 00 10: 00 00 00 00 00 00 00 00 00 01 ff 00 01 01 00 00 20: 00 40 00 40 f1 ff 01 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 40 00 00 00 00 00 00 40 63 01 00 00 0002:01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection (rev ff) 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > IIRC the users that debugged the AMD bus reset issue re-wrote the entire 64 > bytes of the bridge config header and then further narrowed the issue down > to the two registers above. If one bridge implementation can have such an > issue, maybe others do too. Perhaps there's common IP in use. > Are you able > to test other endpoints besides this e1000e device with this setpci > technique? Thanks, I tried with " Broadcom Limited NetXtreme BCM5722 Gigabit Ethernet PCI Express" I observe same issue. Thanks -Bharat > > Alex > > > 0002:01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit > > Network Connection (rev ff) > > 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > > > Thanks > > -Bharat > > > >
On Fri, 30 Nov 2018 06:24:16 +0000 Bharat Bhushan <bharat.bhushan@nxp.com> wrote: > Hi Alex, > > > -----Original Message----- > > From: Alex Williamson <alex.williamson@redhat.com> > > Sent: Friday, November 30, 2018 11:26 AM > > To: Bharat Bhushan <bharat.bhushan@nxp.com> > > Cc: Bjorn Helgaas <bhelgaas@google.com>; Bjorn Helgaas > > <helgaas@kernel.org>; linux-pci@vger.kernel.org; Linux Kernel Mailing List > > <linux-kernel@vger.kernel.org>; bharatb.yadav@gmail.com; David Daney > > <david.daney@cavium.com>; jglauber@cavium.com; > > mbroemme@libmpq.org; chrisrblake93@gmail.com > > Subject: Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus > > > > On Fri, 30 Nov 2018 05:29:47 +0000 > > Bharat Bhushan <bharat.bhushan@nxp.com> wrote: > > > > > Hi, > > > > > > > -----Original Message----- > > > > From: Bjorn Helgaas <bhelgaas@google.com> > > > > Sent: Thursday, November 29, 2018 1:46 AM > > > > To: Bharat Bhushan <bharat.bhushan@nxp.com> > > > > Cc: alex.williamson@redhat.com; Bjorn Helgaas <helgaas@kernel.org>; > > > > linux- pci@vger.kernel.org; Linux Kernel Mailing List <linux- > > > > kernel@vger.kernel.org>; bharatb.yadav@gmail.com; David Daney > > > > <david.daney@cavium.com>; jglauber@cavium.com; > > mbroemme@libmpq.org; > > > > chrisrblake93@gmail.com > > > > Subject: Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus > > > > > > > > On Tue, Nov 27, 2018 at 10:32 PM Bharat Bhushan > > > > <bharat.bhushan@nxp.com> wrote: > > > > > > > > > > -----Original Message----- > > > > > > From: Alex Williamson <alex.williamson@redhat.com> > > > > > > Sent: Tuesday, November 27, 2018 9:39 PM > > > > > > To: Bjorn Helgaas <helgaas@kernel.org> > > > > > > Cc: Bharat Bhushan <bharat.bhushan@nxp.com>; > > > > > > linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; > > > > > > bharatb.yadav@gmail.com; David Daney > > <david.daney@cavium.com>; > > > > Jan > > > > > > Glauber <jglauber@cavium.com>; Maik Broemme > > > > <mbroemme@libmpq.org>; > > > > > > Chris Blake <chrisrblake93@gmail.com> > > > > > > Subject: Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus > > > > > > > > > > > > On Tue, 27 Nov 2018 09:33:56 -0600 Bjorn Helgaas > > > > > > <helgaas@kernel.org> wrote: > > > > > > > > > > > 4) Is there a hardware erratum for this? If so, please > > > > > > > include the URL here. > > > > > > > > > > No h/w errata as of now. > > > > > > > > Does that mean (a) the HW folks agree this is a hardware problem but > > > > they haven't written an erratum, (b) there is an erratum but it > > > > isn't public, (c) we don't have any concrete evidence of a hardware > > > > problem, but things just don't work if we do a bus reset, (d) something > > else? > > > > > > I will say it is (c) - not concluded to be hardware h/w issue. > > > > > > > > > > > > In pci_reset_secondary_bus() I have tried to increase the delay > > > > > after reset > > > > but not helped. > > > > > Do I need to add delay at some other place as well? > > > > > > > > No, I think the place you tried should be enough. > > > > > > > > You should also be able to exercise this from user-space by using > > > > "setpci" to set and clear the Secondary Bus Reset bit in the Bridge > > > > Control register. Then you can also use setpci to read/write config > > > > space of the NIC. The kernel would normally read the Vendor and > > > > Device IDs as the first access to the device during enumeration. > > > > You also might be able to learn something by using "lspci -vv" on > > > > the bridge before and after the reset to see if it logs any AER bits (if it > > supports AER) or the other standard error logging bits. > > > > > > I tried below sequence for Secondary bus reset and device config space > > > show 0xff > > > > > > root@localhost:~# lspci -x > > > 0002:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 80c0 (rev > > > 10) > > > 00: 57 19 c0 80 07 01 10 00 10 00 04 06 08 00 01 00 > > > 10: 00 00 00 00 00 00 00 00 00 01 ff 00 01 01 00 00 > > > 20: 00 40 00 40 f1 ff 01 00 00 00 00 00 00 00 00 00 > > > 30: 00 00 00 00 40 00 00 00 00 00 00 40 63 01 00 00 > > > > > > 0002:01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit > > > Network Connection > > > 00: 86 80 d3 10 06 04 10 00 00 00 00 02 10 00 00 00 > > > 10: 00 00 0c 40 00 00 00 40 01 00 00 00 00 00 0e 40 > > > 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 1f a0 > > > 30: 00 00 24 40 c8 00 00 00 00 00 00 00 63 01 00 00 > > > > > > root@localhost:~# setpci -s 0002:00:00.0 0x3e.b=0x40 > > > root@localhost:~# setpci -s 0002:00:00.0 0x3e.b=0x00 > > > > > > root@localhost:~# lspci -x > > > 0002:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 80c0 (rev > > > 10) > > > 00: 57 19 c0 80 07 01 10 00 10 00 04 06 08 00 01 00 > > > 10: 00 00 00 00 00 00 00 00 00 01 ff 00 01 01 00 00 > > > 20: 00 40 00 40 f1 ff 01 00 00 00 00 00 00 00 00 00 > > > 30: 00 00 00 00 40 00 00 00 00 00 00 40 63 01 00 00 > > > > Just for curiosity sake, what if you re-write the secondary and subordinate > > bus registers here: > > > > # setpci -s 0002:00:00.0 0x19.b=0x01 > > # setpci -s 0002:00:00.0 0x1a.b=0xff > > Result is same, here are logs > > root@localhost:~# setpci -s 0002:00:00.0 0x3e.b=0x40 > root@localhost:~# setpci -s 0002:00:00.0 0x3e.b=0x00 > root@localhost:~# setpci -s 0002:00:00.0 0x19.b=0x01 > root@localhost:~# setpci -s 0002:00:00.0 0x1a.b=0xff > root@localhost:~# lspci -x > 0002:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 80c0 (rev 10) > 00: 57 19 c0 80 07 01 10 00 10 00 04 06 08 00 01 00 > 10: 00 00 00 00 00 00 00 00 00 01 ff 00 01 01 00 00 > 20: 00 40 00 40 f1 ff 01 00 00 00 00 00 00 00 00 00 > 30: 00 00 00 00 40 00 00 00 00 00 00 40 63 01 00 00 > > 0002:01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection (rev ff) > 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff Ok, thanks for scratching my itch. > > IIRC the users that debugged the AMD bus reset issue re-wrote the entire 64 > > bytes of the bridge config header and then further narrowed the issue down > > to the two registers above. If one bridge implementation can have such an > > issue, maybe others do too. Perhaps there's common IP in use. > > > Are you able > > to test other endpoints besides this e1000e device with this setpci > > technique? Thanks, > > I tried with " Broadcom Limited NetXtreme BCM5722 Gigabit Ethernet PCI Express" I observe same issue. Personally I'd exhaust talking with your hardware folks before blocking bus resets at the software level, it seems like a gap in PCIe compliance of the device. Thanks, Alex
Hi Alex, > -----Original Message----- > From: Alex Williamson <alex.williamson@redhat.com> > Sent: Friday, November 30, 2018 9:49 PM > To: Bharat Bhushan <bharat.bhushan@nxp.com> > Cc: Bjorn Helgaas <bhelgaas@google.com>; Bjorn Helgaas > <helgaas@kernel.org>; linux-pci@vger.kernel.org; Linux Kernel Mailing List > <linux-kernel@vger.kernel.org>; bharatb.yadav@gmail.com; David Daney > <david.daney@cavium.com>; jglauber@cavium.com; > mbroemme@libmpq.org; chrisrblake93@gmail.com > Subject: Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus > > On Fri, 30 Nov 2018 06:24:16 +0000 > Bharat Bhushan <bharat.bhushan@nxp.com> wrote: > > > Hi Alex, > > > > > -----Original Message----- > > > From: Alex Williamson <alex.williamson@redhat.com> > > > Sent: Friday, November 30, 2018 11:26 AM > > > To: Bharat Bhushan <bharat.bhushan@nxp.com> > > > Cc: Bjorn Helgaas <bhelgaas@google.com>; Bjorn Helgaas > > > <helgaas@kernel.org>; linux-pci@vger.kernel.org; Linux Kernel > > > Mailing List <linux-kernel@vger.kernel.org>; > > > bharatb.yadav@gmail.com; David Daney <david.daney@cavium.com>; > > > jglauber@cavium.com; mbroemme@libmpq.org; > chrisrblake93@gmail.com > > > Subject: Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus > > > > > > On Fri, 30 Nov 2018 05:29:47 +0000 > > > Bharat Bhushan <bharat.bhushan@nxp.com> wrote: > > > > > > > Hi, > > > > > > > > > -----Original Message----- > > > > > From: Bjorn Helgaas <bhelgaas@google.com> > > > > > Sent: Thursday, November 29, 2018 1:46 AM > > > > > To: Bharat Bhushan <bharat.bhushan@nxp.com> > > > > > Cc: alex.williamson@redhat.com; Bjorn Helgaas > > > > > <helgaas@kernel.org>; > > > > > linux- pci@vger.kernel.org; Linux Kernel Mailing List <linux- > > > > > kernel@vger.kernel.org>; bharatb.yadav@gmail.com; David Daney > > > > > <david.daney@cavium.com>; jglauber@cavium.com; > > > mbroemme@libmpq.org; > > > > > chrisrblake93@gmail.com > > > > > Subject: Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus > > > > > > > > > > On Tue, Nov 27, 2018 at 10:32 PM Bharat Bhushan > > > > > <bharat.bhushan@nxp.com> wrote: > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Alex Williamson <alex.williamson@redhat.com> > > > > > > > Sent: Tuesday, November 27, 2018 9:39 PM > > > > > > > To: Bjorn Helgaas <helgaas@kernel.org> > > > > > > > Cc: Bharat Bhushan <bharat.bhushan@nxp.com>; > > > > > > > linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; > > > > > > > bharatb.yadav@gmail.com; David Daney > > > <david.daney@cavium.com>; > > > > > Jan > > > > > > > Glauber <jglauber@cavium.com>; Maik Broemme > > > > > <mbroemme@libmpq.org>; > > > > > > > Chris Blake <chrisrblake93@gmail.com> > > > > > > > Subject: Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset > > > > > > > bus > > > > > > > > > > > > > > On Tue, 27 Nov 2018 09:33:56 -0600 Bjorn Helgaas > > > > > > > <helgaas@kernel.org> wrote: > > > > > > > > > > > > > 4) Is there a hardware erratum for this? If so, please > > > > > > > > include the URL here. > > > > > > > > > > > > No h/w errata as of now. > > > > > > > > > > Does that mean (a) the HW folks agree this is a hardware problem > > > > > but they haven't written an erratum, (b) there is an erratum but > > > > > it isn't public, (c) we don't have any concrete evidence of a > > > > > hardware problem, but things just don't work if we do a bus > > > > > reset, (d) something > > > else? > > > > > > > > I will say it is (c) - not concluded to be hardware h/w issue. > > > > > > > > > > > > > > > In pci_reset_secondary_bus() I have tried to increase the > > > > > > delay after reset > > > > > but not helped. > > > > > > Do I need to add delay at some other place as well? > > > > > > > > > > No, I think the place you tried should be enough. > > > > > > > > > > You should also be able to exercise this from user-space by > > > > > using "setpci" to set and clear the Secondary Bus Reset bit in > > > > > the Bridge Control register. Then you can also use setpci to > > > > > read/write config space of the NIC. The kernel would normally > > > > > read the Vendor and Device IDs as the first access to the device > during enumeration. > > > > > You also might be able to learn something by using "lspci -vv" > > > > > on the bridge before and after the reset to see if it logs any > > > > > AER bits (if it > > > supports AER) or the other standard error logging bits. > > > > > > > > I tried below sequence for Secondary bus reset and device config > > > > space show 0xff > > > > > > > > root@localhost:~# lspci -x > > > > 0002:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 80c0 > > > > (rev > > > > 10) > > > > 00: 57 19 c0 80 07 01 10 00 10 00 04 06 08 00 01 00 > > > > 10: 00 00 00 00 00 00 00 00 00 01 ff 00 01 01 00 00 > > > > 20: 00 40 00 40 f1 ff 01 00 00 00 00 00 00 00 00 00 > > > > 30: 00 00 00 00 40 00 00 00 00 00 00 40 63 01 00 00 > > > > > > > > 0002:01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit > > > > Network Connection > > > > 00: 86 80 d3 10 06 04 10 00 00 00 00 02 10 00 00 00 > > > > 10: 00 00 0c 40 00 00 00 40 01 00 00 00 00 00 0e 40 > > > > 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 1f a0 > > > > 30: 00 00 24 40 c8 00 00 00 00 00 00 00 63 01 00 00 > > > > > > > > root@localhost:~# setpci -s 0002:00:00.0 0x3e.b=0x40 > > > > root@localhost:~# setpci -s 0002:00:00.0 0x3e.b=0x00 > > > > > > > > root@localhost:~# lspci -x > > > > 0002:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 80c0 > > > > (rev > > > > 10) > > > > 00: 57 19 c0 80 07 01 10 00 10 00 04 06 08 00 01 00 > > > > 10: 00 00 00 00 00 00 00 00 00 01 ff 00 01 01 00 00 > > > > 20: 00 40 00 40 f1 ff 01 00 00 00 00 00 00 00 00 00 > > > > 30: 00 00 00 00 40 00 00 00 00 00 00 40 63 01 00 00 > > > > > > Just for curiosity sake, what if you re-write the secondary and > > > subordinate bus registers here: > > > > > > # setpci -s 0002:00:00.0 0x19.b=0x01 # setpci -s 0002:00:00.0 > > > 0x1a.b=0xff > > > > Result is same, here are logs > > > > root@localhost:~# setpci -s 0002:00:00.0 0x3e.b=0x40 root@localhost:~# > > setpci -s 0002:00:00.0 0x3e.b=0x00 root@localhost:~# setpci -s > > 0002:00:00.0 0x19.b=0x01 root@localhost:~# setpci -s 0002:00:00.0 > > 0x1a.b=0xff root@localhost:~# lspci -x > > 0002:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 80c0 (rev > > 10) > > 00: 57 19 c0 80 07 01 10 00 10 00 04 06 08 00 01 00 > > 10: 00 00 00 00 00 00 00 00 00 01 ff 00 01 01 00 00 > > 20: 00 40 00 40 f1 ff 01 00 00 00 00 00 00 00 00 00 > > 30: 00 00 00 00 40 00 00 00 00 00 00 40 63 01 00 00 > > > > 0002:01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit > > Network Connection (rev ff) > > 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > Ok, thanks for scratching my itch. > > > > IIRC the users that debugged the AMD bus reset issue re-wrote the > > > entire 64 bytes of the bridge config header and then further > > > narrowed the issue down to the two registers above. If one bridge > > > implementation can have such an issue, maybe others do too. Perhaps > there's common IP in use. > > > > > Are you able > > > to test other endpoints besides this e1000e device with this setpci > > > technique? Thanks, > > > > I tried with " Broadcom Limited NetXtreme BCM5722 Gigabit Ethernet PCI > Express" I observe same issue. > > Personally I'd exhaust talking with your hardware folks before blocking bus > resets at the software level, it seems like a gap in PCIe compliance of the > device. Thanks, I will continue to work with our h/w team on this. Thanks -Bharat > > Alex
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 4700d24e5d55..b9ae4e9f101a 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3391,6 +3391,13 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0033, quirk_no_bus_reset); */ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CAVIUM, 0xa100, quirk_no_bus_reset); +/* + * NXP (Freescale Vendor ID) LS1088 chips do not behave correctly after + * bus reset. Link state of device does not comes UP and so config space + * never accessible again. + */ +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_FREESCALE, 0x80c0, quirk_no_bus_reset); + static void quirk_no_pm_reset(struct pci_dev *dev) { /*
NXP (Freescale Vendor ID) LS1088 chips do not behave correctly after bus reset with e1000e. Link state of device does not comes UP and so config space never accessible again. Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com> --- drivers/pci/quirks.c | 7 +++++++ 1 file changed, 7 insertions(+)