Message ID | 51B1C1F3.3070804@fold.natur.cuni.cz (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Martin Mokrejs wrote: > Hi everybody, > > Bjorn Helgaas wrote: >> [+cc linux-pci, Sarah, Alan] >> >> On Mon, Mar 11, 2013 at 10:02 AM, Martin Mokrejs >> <mmokrejs@fold.natur.cuni.cz> wrote: >>> [re-sending to you all three directly, looks the original email did not make it into linux-pci >>> through some filters] >>> >>> I use for my daily work acpiphp to manage express cards in Dell Vostro 3550. >>> I have never seen something like this before and believe this is some new regression >>> in 3.8 series. I had in teh a USB3 card and ejected it. Then I inserted a >>> SATA Sil3132 card but it is not detected and dmesg still ends with last lines >>> added when the USB card was being removed. The funny thing is that lspci reports >>> a mixture of USB-card properties with NEC chips along with Silicon Image eSATA card. >> >> I don't know anything about the kmemleaks mentioned elsewhere in this >> thread, but the idea of "stale PCI device info" seems possibly related >> to some acpiphp issues we've been working on recently. >> >> Starting with v3.9, we don't handle ACPI Bus Check notifications to >> host bridges correctly, and the result is that when we're using >> acpiphp, we don't notice when PCI devices are added or removed. There >> are more details in https://bugzilla.kernel.org/show_bug.cgi?id=57961 > > Looks to me it is already in 3.10-rc4 which I tested now. No, I still do see > same problem like before: a hotremoval of NEC-based xHCI express card is detected > on every second eject. But, sometimes it seems it is only delayed by some 25-30 > seconds. Would have to do more testing. However, there are some *new* kmemleaks > reported by kernel related to acpiphpp bu xhci_hcd. That could a hint why > the hotremoval sometimes proceeds delayed but sometimes maybe not at all or at > least not *immediately* like for any other device? > > However, the stale sysfs entries for partially removed device SiI3132 (sata_sil24 > driver) are NOT appearing anymore, good. That used to be associated with > 'sata_sil24: IRQ status == 0xffffffff, PCI fault or device removal?' line. > Now, I see under 3.10-rc4 the extra message about 'ACPI: Device does not support D3cold'. > would be nice if it said what device is it talking about? About upstream root port > or about my end device (express card)? Is it related by pcie_aspm= kernel > commandline option? If yes, please include the relevant info the message text. > referring to this being affected by the particular value. At the moment I used: > pcie_aspm=off > > --- dmesg_initial__inserted_eSATA__ejected__inserted__ejected__inserted.txt 2013-06-07 02:53:56.000000000 +0200 > +++ dmesg_initial__inserted_eSATA__ejected__inserted__ejected__inserted__ejected.txt 2013-06-07 02:54:09.000000000 +0200 > @@ -1439,3 +1439,5 @@ > [ 254.317365] ata12: SATA max UDMA/100 host m128@0xf6c04000 port 0xf6c02000 irq 19 > [ 256.400454] ata11: SATA link down (SStatus 0 SControl 0) > [ 258.493027] ata12: SATA link down (SStatus 0 SControl 0) > +[ 267.116723] sata_sil24: IRQ status == 0xffffffff, PCI fault or device removal? > +[ 267.117779] ACPI: Device does not support D3cold > > > So, in my eyes the "stale pci info" issue is fixed in 3.10-rc4 at least under acpiphp and pcie_aspm=off. And to be even more exact, I had CONFIG_HOTPLUG_PCI_ACPI=y as I see now an updated v2 patch from Yinghai: [PATCH v3.9 stable] PCI: acpiphp: Re-enumerate devices when host bridge receives Bus Check Please make sure that whatever I tested in plain 3.10-rc4 is what you had in those bugzilla patches under https://bugzilla.kernel.org/show_bug.cgi?id=57961 or what Yinghai posted as an update. Just in case are tested a different version. Martin -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Martin Mokrejs wrote: > Martin Mokrejs wrote: >> Hi everybody, >> >> Bjorn Helgaas wrote: >>> [+cc linux-pci, Sarah, Alan] >>> >>> On Mon, Mar 11, 2013 at 10:02 AM, Martin Mokrejs >>> <mmokrejs@fold.natur.cuni.cz> wrote: >>>> [re-sending to you all three directly, looks the original email did not make it into linux-pci >>>> through some filters] >>>> >>>> I use for my daily work acpiphp to manage express cards in Dell Vostro 3550. >>>> I have never seen something like this before and believe this is some new regression >>>> in 3.8 series. I had in teh a USB3 card and ejected it. Then I inserted a >>>> SATA Sil3132 card but it is not detected and dmesg still ends with last lines >>>> added when the USB card was being removed. The funny thing is that lspci reports >>>> a mixture of USB-card properties with NEC chips along with Silicon Image eSATA card. >>> >>> I don't know anything about the kmemleaks mentioned elsewhere in this >>> thread, but the idea of "stale PCI device info" seems possibly related >>> to some acpiphp issues we've been working on recently. >>> >>> Starting with v3.9, we don't handle ACPI Bus Check notifications to >>> host bridges correctly, and the result is that when we're using >>> acpiphp, we don't notice when PCI devices are added or removed. There >>> are more details in https://bugzilla.kernel.org/show_bug.cgi?id=57961 >> >> Looks to me it is already in 3.10-rc4 which I tested now. No, I still do see >> same problem like before: a hotremoval of NEC-based xHCI express card is detected >> on every second eject. But, sometimes it seems it is only delayed by some 25-30 >> seconds. Would have to do more testing. However, there are some *new* kmemleaks >> reported by kernel related to acpiphpp bu xhci_hcd. That could a hint why >> the hotremoval sometimes proceeds delayed but sometimes maybe not at all or at >> least not *immediately* like for any other device? >> >> However, the stale sysfs entries for partially removed device SiI3132 (sata_sil24 >> driver) are NOT appearing anymore, good. That used to be associated with >> 'sata_sil24: IRQ status == 0xffffffff, PCI fault or device removal?' line. >> Now, I see under 3.10-rc4 the extra message about 'ACPI: Device does not support D3cold'. >> would be nice if it said what device is it talking about? About upstream root port >> or about my end device (express card)? Is it related by pcie_aspm= kernel >> commandline option? If yes, please include the relevant info the message text. >> referring to this being affected by the particular value. At the moment I used: >> pcie_aspm=off >> >> --- dmesg_initial__inserted_eSATA__ejected__inserted__ejected__inserted.txt 2013-06-07 02:53:56.000000000 +0200 >> +++ dmesg_initial__inserted_eSATA__ejected__inserted__ejected__inserted__ejected.txt 2013-06-07 02:54:09.000000000 +0200 >> @@ -1439,3 +1439,5 @@ >> [ 254.317365] ata12: SATA max UDMA/100 host m128@0xf6c04000 port 0xf6c02000 irq 19 >> [ 256.400454] ata11: SATA link down (SStatus 0 SControl 0) >> [ 258.493027] ata12: SATA link down (SStatus 0 SControl 0) >> +[ 267.116723] sata_sil24: IRQ status == 0xffffffff, PCI fault or device removal? >> +[ 267.117779] ACPI: Device does not support D3cold >> >> >> So, in my eyes the "stale pci info" issue is fixed in 3.10-rc4 at least under acpiphp and pcie_aspm=off. No, it is not. :( > > And to be even more exact, I had CONFIG_HOTPLUG_PCI_ACPI=y as I see now an updated > v2 patch from Yinghai: > [PATCH v3.9 stable] PCI: acpiphp: Re-enumerate devices when host bridge receives Bus Check > > Please make sure that whatever I tested in plain 3.10-rc4 is what you had in those bugzilla patches > under https://bugzilla.kernel.org/show_bug.cgi?id=57961 or what Yinghai posted as an update. > Just in case are tested a different version. Sorry, I was "able" to plugin a firewire card into express card slot faster than xhci_hcd released resource of the to be yet hotremoved NEC-based xHCI card. So, like in older kernels, lspci reports chimeric entry 11:00 of the NEC card and of the VIA-based firewire card. Upon eject of the VIA card xhci_hcd released resources with usual messages, including the complaint that 'xhci_hcd 0000:11:00.0: Host not halted after 16000 microseconds.' Nothing new in dmesg, I would just say that whatever makes xhci_hcd or pcieport slow in turning PME# to disabled is efectively blocked if I plugin some card back into the express slot. It seems to me the "conclusion" in the past in Jan-April was that pcieport is to blame and not xhci_hcd, and it always seemed to proceed smoothly once 'PME# disabled' appeared in dmesg. > > Martin > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- dmesg_initial__inserted_eSATA__ejected__inserted__ejected__inserted.txt 2013-06-07 02:53:56.000000000 +0200 +++ dmesg_initial__inserted_eSATA__ejected__inserted__ejected__inserted__ejected.txt 2013-06-07 02:54:09.000000000 +0200 @@ -1439,3 +1439,5 @@ [ 254.317365] ata12: SATA max UDMA/100 host m128@0xf6c04000 port 0xf6c02000 irq 19 [ 256.400454] ata11: SATA link down (SStatus 0 SControl 0) [ 258.493027] ata12: SATA link down (SStatus 0 SControl 0) +[ 267.116723] sata_sil24: IRQ status == 0xffffffff, PCI fault or device removal? +[ 267.117779] ACPI: Device does not support D3cold